A next-generation citation styling system for the scholarly ecosystem.
CSLN is a ground-up reimagining of the Citation Style Language (CSL), designed to make citation styles easier to write, maintain, and reason about—while remaining fully compatible with the existing ecosystem of 10,000+ styles.
- Why CSLN?
- Key Design Principles
- Project Status
- Architecture
- For Style Maintainers
- For Developers
- Roadmap
- Contributing
- License
- Acknowledgments
CSL 1.0 has been tremendously successful. It powers citation formatting in Zotero, Mendeley, Pandoc, and countless other tools. But after 15+ years of evolution, the XML-based format has accumulated complexity that makes styles difficult to author and maintain.
Consider this excerpt from APA 7th edition (apa.csl):
<macro name="author">
<names variable="author">
<name and="symbol" initialize-with=". " delimiter=", "/>
<label form="short" prefix=" (" suffix=")" text-case="capitalize-first"/>
<substitute>
<names variable="editor"/>
<names variable="translator"/>
<choose>
<if type="report">
<text variable="publisher"/>
<text macro="title"/>
</if>
<else-if type="legal_case">
<text variable="title"/>
</else-if>
<!-- ... 50 more lines of conditionals ... -->
</choose>
</substitute>
</names>
</macro>This is procedural code disguised as data. The style embeds:
- Control flow (
<choose>,<if>,<else-if>) - Iteration (implicit in
<names>) - Fallback logic (
<substitute>) - Type-specific overrides scattered throughout
When you multiply this across an entire style file, you get 3,000+ lines of XML that are nearly impossible to diff, review, or extend.
CSLN separates what from how:
# csln-apa.yaml
info:
title: APA 7th Edition
options:
processing: author-date
substitute:
template: [editor, translator, title]
contributor-role-form: short
contributors:
display-as-sort: first
and: symbol
shorten:
min: 3
use-first: 1
citation:
template:
- contributor: author
form: short
- date: issued
form: year
bibliography:
template:
- contributor: author
form: long
- date: issued
form: year
wrap: parentheses
- title: primary
emph: true50 lines instead of 3,000. The same semantic information, expressed declaratively.
- Declarative Templates: High-level components (
contributor,date,title) replace procedural logic. - Three-Tier Options: Context-aware formatting (global, citation/bibliography, and type-specific).
- Oracle Verification: Built-in scripts to compare output against
citeproc-jsfor exact fidelity. - Modern Input: Native support for CSLN YAML/JSON bibliography format with EDTF date support.
- Diverse Fixtures: Built-in 10-item test dataset covering edge cases like massive author lists and missing dates.
Instead of encoding logic in the style, CSLN styles declare intent. The processor implements the logic once, correctly.
| CSL 1.0 | CSLN |
|---|---|
<choose><if type="book">...</if></choose> |
overrides: { book: { emph: true } } |
<names><substitute>...</substitute></names> |
options.substitute.template: [editor, title] |
| 20 lines of et-al logic | shorten: { min: 3, use-first: 1 } |
Common behaviors are extracted to configuration, not scattered through templates:
- Contributor formatting: initialization, sorting, et-al rules
- Date formatting: precision, localization
- Substitution: what to show when author is missing
- Processing mode: author-date vs. note-based
CSLN uses strongly-typed enums, not strings:
pub enum ContributorRole {
Author, Editor, Translator, Director, // ...
}
pub enum TitleType {
Primary, ParentSerial, ParentMonograph,
}Typos become compile errors. Invalid combinations are impossible.
Every CSL 1.0 style can be automatically migrated to CSLN. We verify correctness by comparing output against citeproc-js, the reference CSL implementation.
CSLN prevents data loss by supporting:
- EDTF Dates: ranges, uncertainty, and approximations
- Rich Text/Math: mathematical notation and strict Unicode handling
- Multilingualism: scoped fields for multi-script data
The engine is built for dual-mode operation:
- Batch: High-throughput CLI for build systems (like Pandoc)
- Interactive: Low-latency JSON server mode for reference managers (like Zotero)
CSLN is built for a long-lived ecosystem:
- Explicit Versioning: Styles include a
versionfield for unambiguous schema identification. - Permissive Runtime: The engine ignores unknown fields, allowing older versions of the processor to run newer styles gracefully.
- Round-trip Safety: Unknown fields are captured during parsing and preserved during serialization, ensuring no data loss when editing with different tool versions.
- Strict Linting: While the runtime is permissive, development tools (like
csln_analyze) are strict, catching typos and deprecated fields.
Note: This project is in active development. While the core architecture is solid, rendering fidelity across the full corpus of 2,844 styles is still a work in progress.
| Component | Status |
|---|---|
CSL 1.0 Parser (csl_legacy) |
✅ Complete - parses all 2,844 official styles |
CSLN Schema (csln_core) |
✅ Complete - options, templates, locale, rendering |
Migration Tool (csln_migrate) |
🔄 In Progress - compiles templates, extracting style-specific formatting |
CSLN Processor (csln_processor) |
🔄 In Progress - APA verified, other styles need work |
| Oracle Verification | ✅ Infrastructure complete - citeproc-js comparison |
Corpus Analyzer (csln_analyze) |
✅ Complete - feature usage stats for 2,844 styles |
To ensure high performance and maintainable history, CSLN follows a hybrid style management strategy:
- Core Styles (In-Repo): This repository maintains the top ~20 "parent" styles (APA, Chicago, IEEE, Vancouver, etc.) and edge-case test styles. These serve as our primary integration test suite.
- Community Styles (Submodule): The broader ecosystem of 2,000+ journal-specific styles is managed in a separate repository (e.g.,
csln-styles) and linked as a git submodule.
This approach keeps the core repository lean while providing a tight development loop for the most impactful styles.
APA 7th: 5/5 citations, 5/5 bibliography (exact match with citeproc-js)
Batch Testing (50 styles sampled):
Citations: 74% with 5/5 match
Bibliography: Limited matches (style-specific formatting issues)
Errors: 0 migration errors, 0 processor errors
Features implemented:
✓ page-range-format (1,076 styles) - expanded, minimal, chicago
✓ delimiter-precedes-et-al (786 styles) - always, never, contextual
✓ initialize-with (1,437 styles) - name initialization
✓ name-as-sort-order (2,100+ styles) - family-first ordering
✓ disambiguate-add-givenname (935 styles) - name expansion
✓ disambiguate-add-names (1,241 styles) - et-al expansion
✓ subsequent-author-substitute (314 styles) - "———" replacement
✓ type-specific overrides - publisher suppression, page formatting
✓ page label extraction - "pp." from CSL Label nodes (#69)
✓ pluggable output formats - plain text, HTML, and Djot
✓ semantic rendering - machine-readable class wrapping (e.g. `csln-title`)
Known gaps (in progress):
○ Group delimiter extraction (colon vs period between components)
○ Volume-pages delimiter varies by style (comma vs colon)
○ DOI suppression for styles that don't output DOI
crates/
├── csl_legacy/ # CSL 1.0 XML parser (read-only)
├── csln_analyze/ # Corpus-wide analysis and batch testing
│ ├── src/
│ │ ├── analyzer.rs # Style feature statistics
│ │ ├── ranker.rs # Parent style ranking logic
│ │ └── main.rs # CLI entry point
├── csln_cli/ # CLI tools (schema generation, etc.)
├── csln_core/ # CSLN schema and core types
│ ├── src/
│ │ ├── citation.rs # Citation model
│ │ ├── embedded/ # Style presets (APA, Chicago, etc.)
│ │ ├── legacy.rs # CSL 1.0 legacy type bridge
│ │ ├── locale/ # Localization (terms, dates, raw mapping)
│ │ ├── options/ # Style configuration groups
│ │ ├── presets.rs # Named configuration bundles
│ │ ├── renderer.rs # Rendering orchestration
│ │ ├── template.rs # Template components
│ │ └── reference/ # Internal reference model
├── csln_migrate/ # CSL 1.0 → CSLN converter
│ ├── src/
│ │ ├── options_extractor/ # Extracts config from XML
│ │ ├── template_compiler/ # Compiles XML macros to CSLN templates
│ │ ├── upsampler.rs # XML to CSLN Node mapping
│ │ ├── analysis/ # Style-specific feature detection
│ │ └── passes/ # Transformation passes
└── csln_processor/ # Citation/bibliography renderer
├── src/
│ ├── processor/ # Core logic (disambiguation, matching, sorting)
│ ├── values/ # Field-level extraction and formatting
│ └── render/ # String rendering (mod, component)
.agent/ # LLM agent instructions and design documents
locales/ # CSLN YAML locale files (en-US, de-DE, etc.)
scripts/ # Oracle verification (citeproc-js) and automation
styles/ # CSLN YAML styles
styles-legacy/ # 2,844 CSL 1.0 styles (submodule)
If you maintain CSL styles, here's what CSLN means for you:
- Readable diffs: Changes are obvious in YAML
- No macro hunting: All behavior is visible in one place
- Validation: Schema catches errors before runtime
CSLN uses the same conceptual model as CSL:
- Contributors (author, editor, translator)
- Dates (issued, accessed)
- Titles (primary, container)
- Numbers (volume, issue, pages)
# Convert an existing CSL style
cargo run --bin csln-migrate -- styles-legacy/apa.csl
# Output: csln-new.yaml with clean CSLN formatCSLN includes embedded templates for common styles (APA, Chicago, Vancouver, IEEE, Harvard). Instead of defining a template from scratch, you can reference a preset:
citation:
use-preset: apa
bibliography:
use-preset: vancouverThis effectively "inherits" the standard template for that style, which you can then customize with options.
git clone https://github.com/bdarcus/csl26
cd csl26
cargo build --workspace
cargo test --workspace# Run CSLN processor with a style (default plain text)
cargo run --bin csln-processor -- styles/apa-7th.yaml
# Generate semantic HTML
cargo run --bin csln-processor -- --format html styles/apa-7th.yaml
# Generate Djot with semantic attributes
cargo run --bin csln-processor -- --format djot styles/apa-7th.yaml
# Disable semantic classes for clean output
cargo run --bin csln-processor -- --format html --no-semantics styles/apa-7th.yamlThe csln_analyze tool scans all CSL 1.0 styles to identify patterns and gaps:
# Analyze all styles in the styles-legacy/ directory
cargo run --bin csln-analyze -- styles-legacy/
# Output as JSON for scripting
cargo run --bin csln-analyze -- styles-legacy/ --jsonThis helps prioritize which features to implement based on actual usage across 2,844 styles.
The scripts/ directory contains tools to verify CSLN output against citeproc-js, the reference CSL 1.0 implementation.
cd scripts
npm install # First time only - installs citeproc
# oracle.js - Render citations/bibliography with citeproc-js
node oracle.js ../styles-legacy/apa.csl # Both citations and bibliography
node oracle.js ../styles-legacy/apa.csl --cite # Citations only
node oracle.js ../styles-legacy/apa.csl --bib # Bibliography only
node oracle.js ../styles-legacy/apa.csl --json # JSON output for scripting
# oracle-e2e.js - End-to-end migration test
# Migrates CSL 1.0 → CSLN → csln-processor, then compares with citeproc-js
node oracle-e2e.js ../styles-legacy/apa.cslExample output from oracle-e2e.js:
=== End-to-End Oracle Test: apa ===
--- CITATIONS ---
✅ ITEM-1
✅ ITEM-2
✅ ITEM-3
✅ ITEM-4
✅ ITEM-5
Citations: 5/5 match
cargo doc --workspace --openYou can generate a formal JSON Schema for CSLN styles using the CLI:
# Output schema to stdout
cargo run --bin csln-cli -- schema
# Save to file
cargo run --bin csln-cli -- schema > csln.schema.jsonThis schema can be used to validate styles or provide intellisense in editors like VS Code.
- Bibliography formatting (page ranges, subsequent author substitute)
- Complete bibliography formatting (complex punctuation, affixes)
- Resolve high-frequency gaps identified by
csln_analyze - Automated verification pipeline for top 100 styles
- Schema versioning and forward compatibility
- Bulk migration of all 2,844 styles
- WASM build for browser use
- Additional locales (de-DE, fr-FR, tr-TR, etc.)
- Style presets vocabulary (see STYLE_ALIASING.md)
- Embedded priority templates (APA, Chicago, Vancouver, IEEE, Harvard)
- Preset-aware migration (emit preset names instead of expanded config)
- Note-bibliography citation style support
- CSLN 1.0 specification
- Visual style editor
- Integration guides for reference managers
CSLN follows an AI-first development model. The core CSLN schema and data model was designed by the project maintainer, and AI agents (like Claude Code) have adapted and extended this work to build out the migration tooling, processor, and analysis infrastructure. This approach lowers the barrier to entry, allowing the most valuable contributions to come from Domain Experts and Style Authors rather than just systems programmers.
The most impactful way to contribute is by providing the "raw material" that the AI needs to understand and solve complex citation problems:
- Surface Real-World Gaps: Describe formatting requirements or edge cases that current systems (including CSL 1.0) handle poorly.
- Provide Contextual Resources: Shared style guides, official manuals, and sample documents are high-value inputs that allow the LLM to extract logic and implement it.
- Refine Instructions: Help improve the "identity" and "skills" of the AI agents by suggesting updates to the
.agentdirectory. - Report Pain Points: Use GitHub issues to describe what is difficult or counter-intuitive in the current CSLN model.
We treat GitHub Issues as Context Packets for our AI agents. Here is the current lifecycle:
- Context Submission: A Domain Expert submits an issue with dense context (e.g., "Legal citations in this jurisdiction require X, see attached PDF").
- Agent Activation: A project maintainer activates an AI agent (using tools like
antigravityorgemini) initialized with the Domain Expert Persona. - Implementation: The agent reads the issue, extracts the rules, and generates the necessary Rust code, YAML schema changes, or tests.
- Verification: The Code and tests are verified against the Oracle (citeproc-js) to ensure correctness.
Note: While maintainers currently trigger these agents manually, we are actively developing workflows to automate this loop directly from GitHub Actions.
If you want to contribute code directly, focus on:
- Core Engine Architecture: Improving the performance and correctness of
csln_processor. - Schema Design: Ensuring
csln_coreremains robust and extensible. - Agent Tooling: Developing new "skills" or scripts that enhance the autonomy and capabilities of the AI agents.
Active development uses beans for local task tracking (see .beans/ directory). GitHub Issues remain open for:
- Community bug reports: Submit issues when you find rendering defects or incorrect output
- Feature requests: Propose new capabilities or improvements
- Public discussion: Comment on planned work and provide domain expertise
Current development tasks are tracked locally as beans. If you see a GitHub issue marked with a migration note, the work is actively being tracked in the .beans/ directory. The issue will be closed when the work is completed.
Use the /bean skill (see .claude/skills/bean/SKILL.md) for local task management:
/bean list # Show all tasks
/bean next # Get recommended task
/bean show BEAN_ID # View details
/bean update BEAN_ID --status completedAll beans are git-tracked markdown files with dependency relationships and priority levels.
MPL-2.0 - see LICENSE for details.
CSLN builds on the foundation laid by the CSL community over 15+ years. Special thanks to:
- Frank Bennett (citeproc-js)
- The CSL specification authors
- Thousands of style contributors
CSLN: Citation styles should be data, not programs.