Assemble structured Word documents from a YAML content file (and an optional
.docxcover template), runnable from any working directory.
Define headings, body text, bullet points, figures, references, and a table of contents in content.yaml. Install the tool once, then run docx_builder build from any project directory. The package is content-agnostic: it ships no cover templates and no sample content. You bring your own.
- How it works
- Install
- Quick start
- content.yaml reference
- Styles
- Project structure
- CLI reference
- Development
- Create a directory for your document with a
content.yamlfile (and animages/folder if you use figures). - Run
docx_builder buildfrom that directory. It reads the YAML, optionally loads a.docxcover template (only whencover.templateis set; nothing is bundled), fills the cover table by row index, and renders every section in order. - On macOS with Microsoft Word,
buildfinalises the table of contents for you when the document declares one. On Windows or Linux the.docxis still generated normally, but the TOC stays as a placeholder — open it in Word and refresh fields (Ctrl+A, thenF9) to populate it. Automatic TOC finalisation and PDF export are macOS-only.
content.yaml + images/ ──► docx_builder build ──► <output>.docx
Page numbers are added automatically to the sections: block. Items in front_matter: (cover sheet, TOC) are excluded from the footer counter but still count toward the total.
Install once as a global tool with uv, straight from the repository:
uv tool install git+https://github.com/lipex360x/docx_builder.gitOr from a local clone:
uv tool install . # snapshot install
uv tool install --editable . # live install, source edits propagate without reinstallThis registers the docx_builder binary in ~/.local/bin/. Because the package is content-agnostic, provide your own cover .docx through cover.template or --template-dir when you want a cover sheet.
The core install has no PDF post-processing dependency. The opt-in --fix-toc-links flag needs the pdf-links extra, which adds PyMuPDF:
uv tool install 'git+https://github.com/lipex360x/docx_builder.git[pdf-links]'To upgrade a snapshot install after pulling changes:
uv tool install --reinstall .Tip
An editable install (uv tool install --editable .) picks up .py and bundled-data edits automatically. Reinstall only when [project.scripts] entries or dependencies change.
mkdir -p ~/projects/my-report
cd ~/projects/my-report
docx_builder initinit scaffolds a role-based project structure: a report/ directory (with content.yaml, images/, assets/, output/, templates/), a single conditional CLAUDE.md, a .gitignore and a .work/ scratch dir. Add optional layers with --with brief,references,src,meta,tools,monitor,ai-check (dependencies auto-resolved), or --full for all of them plus the embedded, race-safe screenshot_intake.py (the monitor layer). --name is injected into CLAUDE.md and content.yaml (it defaults to the directory name); --cover writes a cover + TOC content.yaml (Shape B). The scaffolder is idempotent: re-running only adds what is missing and never overwrites existing files without --force. The generated content.yaml is a starter with Replace me. body text, so you still author the real content for whatever you are building: CV, report, manual, paper, contract, or fiction. build resolves content.yaml from the project root, falling back to report/, so docx_builder build . works on a scaffolded project.
cover: # optional, omit for a sectionless document
template: ../templates/MyCover.docx # your own .docx, nothing is bundled
output: "Report_{number}.docx" # placeholders: any string field under cover
number: "0001"
rows: # one per row of the cover table, in order
- Project Name
- Document Title
- Author Name
- "0001"
- "2026-06-15"
styles: # optional, overrides built-in defaults
h1: { font_size: 16pt, color: "#003366" }
body: { align: justify }
front_matter: # rendered first, no page numbers
- call: page_break
- call: toc
levels: 1-2
sections: # rendered after, page numbers start here
- call: h1
text: Introduction
- call: body
text: This is the opening paragraph.
- call: figure
filename: diagram.png
label: Figure 1.1
caption: System overview
width: 5indocx_builder build
# Saved: /Users/you/projects/my-report/Report_0001.docx
# Report:
# Words: 1240
# Reading time: ~6 min
# Pages: n/a (known only after export to PDF)
# Em-dashes (U+2014): 0After saving, build prints a short report: word count, estimated reading time (at 200 wpm), a page-count line, and an em-dash (U+2014) counter. When em-dashes are present the line is flagged <- remove before shipping. The page count is n/a for a plain build because only the Word/PDF export path can determine the real count.
Or specify a directory:
docx_builder build /path/to/some-other-projectWhen your document declares a toc section, build finalises it automatically on macOS with Microsoft Word: it drives Word to update every table of contents and all fields, then writes the populated .docx back over the source. No manual F9, no PDF required.
docx_builder build --no-finalize # skip the Word pass, keep build pure and fastNote
Off macOS or without Word installed, build cannot fill the TOC (it is a Word field that only Word can repaginate). It leaves a placeholder, prints a short note, and exits cleanly. On Windows, open the .docx in Word and press Ctrl+A then F9 to populate it (Cmd+A on macOS). PDF export is macOS-only.
For PDF, the export drives Word for you on macOS:
docx_builder export pdf
# Exported: /Users/you/projects/my-report/Report_0001.pdf (2 pages)
docx_builder build --pdf # build then export in one command
docx_builder build --pdf --open # and open both the PDF and the finalised .docx in WordWarning
The reported page count is read from the actual PDF. The cached <Pages> value inside the .docx is unreliable and is never trusted.
By default the export also writes the populated TOC back over the source .docx, so the source ends finalised. Pass --no-update-source to leave it byte-identical, useful when export pdf --input SomeFile.docx points at a file you do not want overwritten. --open is macOS-only; elsewhere it prints a note and skips.
Note
Known Word limitation: ToC hyperlinks in the PDF. In the exported PDF, the clickable table-of-contents entries can occasionally jump to the heading one entry ahead of the target (clicking an entry lands on the next heading). This is a bug in Word for Mac's Save-as-PDF engine: the generated .docx bookmarks are correct, and page numbers and content are unaffected. The bug is rare, so the fix is opt-in: pass --fix-toc-links to export pdf or build --pdf to post-process the PDF and rewrite each ToC link to its heading's real page (it prints Fixed N ToC link(s)). This requires the optional pdf-links extra (uv tool install 'docx_builder[pdf-links]'); the default export path is unchanged and gains no new dependency.
Every top-level block is optional. The minimal valid content.yaml produces a blank document. Realistic shapes:
| Block | Type | Description |
|---|---|---|
cover |
mapping | Optional cover sheet. When omitted, no template is loaded and a blank document is used. |
styles |
mapping | Optional visual overrides, see Styles. |
page_numbers |
bool | Default true. Set false to disable the footer counter everywhere. |
front_matter |
list | Sections rendered first, without page numbers (cover sheet items, TOC). |
sections |
list | Sections rendered after front_matter, with page numbers (unless page_numbers: false). |
| Field | Required | Description |
|---|---|---|
template |
no | .docx cover template. Absolute path, relative path with separators (anchored at project dir), or bare filename (looked up in --template-dir). Omit for a blank document. |
output |
no | Output filename pattern. Placeholders are any string field defined under cover (e.g. {number}, {name}). Defaults to Report_{number}.docx. |
number |
no | Document identifier, exposed as {number} to output. |
rows |
no | Strings filled into the cover table by row index. Row 0 maps to row 0 of the .docx table, into the second column. |
ai_declaration |
no | Optional paragraph appended after the cover table with a bold "AI Use Declaration:" prefix. |
call |
Required fields | Optional fields | Description |
|---|---|---|---|
page_break |
(none) | (none) | Hard page break |
toc |
(none) | levels (default "1-2") |
Word TOC field, populated by build on macOS+Word or via Cmd+A then F9 |
h1 / h2 / h3 |
text |
style |
Headings 1 to 3 (default colour is black) |
body |
text |
style |
Body paragraph |
bullet |
text |
style |
Bulleted item with configurable glyph |
bold_lead |
bold, rest |
style |
Bullet with bold lead phrase followed by regular text |
reference |
text |
style |
Hanging-indent paragraph for bibliography entries |
figure |
filename, label, caption |
width, style, caption_style |
Centred image with caption |
figure_pair |
filename1, filename2, label, caption |
width1, width2, style, caption_style |
Two images side by side |
table |
rows (or header) |
header, widths, col_align, borders, style |
Tabular grid; bold header row, per-column % widths and alignment, cell shading, bullets-in-cell |
A table takes a list of rows (each a list of cell values) and an optional bold header row:
- call: table
header: [Mês, Vendas, Custos]
rows:
- [Janeiro, "10.000", "6.000"]
- [Fevereiro, "12.500", "7.200"]
widths: [40, 30, 30] # optional, % per column, must sum to 100
col_align: [left, right, right] # optional, horizontal align per column
borders: true # optional, default true
style:
header_fill: "#003366" # header background
header_color: "#FFFFFF" # header text colour (header only)
stripe_fill: "#EEF3FA" # zebra background on alternate data rows
valign: center # vertical align of every cellAll rows must have the same number of cells as the column count. widths are percentages of the usable page width (so the table never overflows the margins) and must sum to 100. A cell value with \n becomes a line break inside the cell. Set borders: false for a borderless grid.
Styling notes:
-
Shading:
header_fillpaints the header row,stripe_fillzebra-stripes alternate data rows (both accept#RRGGBBorRRGGBB, off by default). -
Header text:
header_colorcolours the header text only (pair it with a darkheader_fill); the generalcolorkey paints every cell. -
Alignment:
col_alignoverrides alignment per column (great for numbers),valign(top/center/bottom) sets the vertical anchor of every cell. -
Font: a table has no font of its own — cells inherit the document body font (the same one
call: bodyuses), so the table always matches the running text. A document built without a cover template uses the blankpython-docxtheme, whose body font is Cambria (serif) while headings are Calibri (sans) — sobody,bulletand the table all render serif there (not a table-specific issue). Setfont_familyon thetablestyle to override the table's font only, or onbody/bullet/etc. (or start from a modern cover template) to change the whole body. -
Bullets in a cell: make a cell a
{bullets: [...]}mapping and each item becomes a bulleted paragraph inside that cell:rows: - ["Features", { bullets: ["fast", "safe", "tested"] }]
Painting one specific cell, and merging cells, are not supported yet.
Any section type may appear in either front_matter or sections. The distinction is only about page numbering. A toc is the exception: it is always unnumbered (kept out of the PAGE sequence) wherever you place it, so the body always starts at PAGE 1.
# Mode A: no page numbers anywhere (CVs, one-pagers, fliers)
page_numbers: false
sections:
- call: h1
text: Jane Doe# Mode B: cover/TOC unnumbered, content numbered (reports)
front_matter:
- call: page_break
- call: toc
sections:
- call: h1
text: Introduction# Mode C: every page numbered (simple paginated docs)
sections:
- call: h1
text: Chapter 1Note
The footer {total} placeholder counts the pages of the numbered body section, not the whole document, so a report with an unnumbered cover and TOC closes on N / N (not N / N+front-matter). The body begins on a new page at PAGE 1.
Note
The legacy flag hide_page_counter: true on individual sections is deprecated: build prints a one-time warning when it sees it, and it is scheduled for removal in v0.5. New documents should use the front_matter: block instead.
Tip
Missing image files do not crash the build. A [IMAGE NOT FOUND: filename] placeholder is inserted instead, so you can draft the document before all screenshots are ready.
Visual properties (font sizes, colors, alignment, spacing, bullet glyph, footer format, and more) are controlled by an optional styles: block in content.yaml. When the block is absent, the built-in defaults are used.
styles:
h1: { font_size: 16pt, color: "#003366", bold: true }
body: { font_size: 11pt, align: justify }
bullet: { glyph: "→ " }
footer: { format: "Page {page} of {total}", align: center }
sections:
- call: h1
text: Special heading
style: { font_size: 20pt } # inline override wins over the block aboveCascade (later wins): built-in defaults, then the styles: block, then inline style: per section.
Full schema, accepted units, color formats, defaults, and every section type's keys are documented in docs/styles-reference.md.
The package source tree:
.
├── docx_builder/
│ ├── __init__.py
│ ├── cli/ # argparse entry package, one module per subcommand
│ │ ├── __init__.py # DESCRIPTION + EPILOG + _build_parser() + main()
│ │ ├── _shared.py # next-step error printers + resolve_directory()
│ │ ├── _build.py # build command (+ --pdf / --open / --no-finalize / --fix-toc-links)
│ │ ├── _init.py # init command
│ │ ├── _export.py # export pdf command (+ --fix-toc-links)
│ │ ├── _install.py # install skill command
│ │ └── _validate.py # validate command (em/en dash gate)
│ ├── builder.py # build(), init_project(), has_toc()
│ ├── export.py # export_pdf() and finalize_source() via Word + JXA (macOS)
│ ├── toc_links.py # opt-in PDF ToC-hyperlink repair via PyMuPDF (pdf-links extra)
│ ├── report.py # post-build report: words, reading time, em-dash counter
│ ├── validate.py # content.yaml prose check: em/en dash & other AI-tells
│ ├── scaffold.py # role-based init scaffolder: layers, idempotence, .gitkeep
│ ├── elements.py # paragraph primitives (h1, h2, h3, body, bullet, …)
│ ├── figure.py # figure() and figure_pair() with centred captions
│ ├── pagination.py # PAGE / NUMPAGES footer
│ ├── renderer.py # dispatches content.yaml section calls
│ ├── styles.py # StyleResolver + length/color/align parsing
│ ├── summary.py # Word TOC field insertion
│ ├── table.py # table() builder + border utilities
│ ├── skill_installer.py # logic for `docx_builder install skill`
│ ├── skill/
│ │ └── SKILL.md # Claude Code skill, shipped as package data
│ └── templates/
│ ├── content.skeleton.yaml # legacy starter (kept for reference)
│ ├── default_styles.yaml # built-in style defaults
│ └── scaffold/ # init templates: CLAUDE.md, .gitignore, content shapes, screenshot_intake
├── docs/
│ └── styles-reference.md # full style schema (no-AI manual)
├── tests/ # pytest suite
└── pyproject.toml
init creates a role-based project where each directory is a role in the pipeline. The base layout is always created:
.
├── CLAUDE.md # context for Claude Code working in this project
├── .gitignore
├── report/ # the build: content.yaml renders into output/
│ ├── content.yaml # the document definition (starter, with "Replace me." body)
│ ├── images/ # rendered figures referenced by content.yaml
│ ├── assets/ # editable figure sources (svg, xlsx, diagrams)
│ ├── output/ # generated .docx (and .pdf on macOS)
│ └── templates/ # cover .docx templates, if any
└── .work/ # transient scratch space (gitignored)
Optional layers are added with --with <names> (comma-separated) or all at once with --full. Dependencies are auto-resolved (e.g. monitor pulls in tools):
| Layer | Directory | Role |
|---|---|---|
brief |
brief/ |
the brief or instructions you are working from |
references |
references/ |
reference and source material you consulted |
src |
src/ |
your work product: code, data, whatever you produce |
meta |
meta/ |
checklist, notes, metadata |
tools |
tools/ |
automation scripts (pulled in by monitor and ai-check) |
monitor |
tools/monitor/ |
race-safe screenshot_intake.py (pulls tools) |
ai-check |
tools/ai-detection/ + report/ai-check/ |
AI-detection drivers, and where their output lands (pulls tools) |
Empty directories get a .gitkeep. The scaffolder is idempotent: re-running only adds what is missing and never overwrites existing files without --force. See Quick start for --name and --cover.
docx_builder --version | -v
docx_builder init [DIR] [--full] [--with LAYERS] [--name NAME] [--cover] [--force]
docx_builder build [DIR] [--output FILE] [--template-dir DIR] [--no-finalize] [--pdf] [--no-update-source] [--open] [--fix-toc-links]
docx_builder validate [DIR] [--strict] [--also FILE...]
docx_builder export pdf [DIR] [--input FILE] [--output FILE] [--no-update-source] [--open] [--fix-toc-links]
docx_builder install skill [--scope local|global]
--version/-vprints the installed version and exits.DIRdefaults to the current working directory.initscaffolds a role-based project (see Quick start).--fulladds every layer plus the embeddedscreenshot_intake.py;--with a,b,cadds named optional layers (deps auto-resolved);--namesets the project name;--coverwrites Shape B;--forceoverwrites existing files instead of skipping them. It is idempotent and never clobbers existing files without--force.--outputoverrides the YAMLcover.outputand the default pattern (forbuild); forexport pdfit overrides the destination.pdfpath.--template-diroverrides the template lookup directory.validateparsescontent.yamland scans only the prose fields (h1/h2/h3,body,bullet,caption,bold_lead,reference), ignoring keys, paths and filenames. It prints each hit with its section, field and position, and exits 1 when any forbidden character is found (0 when clean), so it can gate a pre-commit hook or CI step. Default rules flag em-dash (U+2014), horizontal bar (U+2015) and figure dash (U+2012);--strictadds en-dash (U+2013).--also FILE...scans extra plain-text/markdown files with the same rules.--no-finalize(onbuild) skips the Word pass that populates the TOC, keepingbuildpure and fast (useful for CI or scripting on macOS).--pdf(onbuild) builds then exports to PDF in one shot. Requires macOS + Microsoft Word.export pdfconverts a built.docxto PDF via Microsoft Word (macOS only). Input defaults tobuild's filename resolution;--inputoverrides it. The PDF reports its real page count.--no-update-source(onexport pdf, honoured bybuild --pdf) leaves the source.docxbyte-identical instead of writing back the populated TOC.--openopens the result after the command finishes: with--pdfboth the.pdfand the finalised.docx(in Word), otherwise the.docx. macOS-only; elsewhere it prints a note and skips.--fix-toc-links(onexport pdf, honoured bybuild --pdf) post-processes the PDF to repair ToC hyperlinks that Word's PDF engine shifted. Off by default; needs the optionalpdf-linksextra (uv tool install 'docx_builder[pdf-links]').
After cloning, bootstrap once:
uv sync
pre-commit install # activates dev-quality on every commitThen the standard workflow:
uv run pytest
uv run ruff check docx_builder tests
uv run --with mypy mypy docx_builder
pre-commit run --all-files # run all dev-quality checkers manuallyImportant
The pre-commit install step writes .git/hooks/pre-commit. Without it, the hook config in .pre-commit-config.yaml is inert and git commit bypasses every quality check. Never bypass the hooks with --no-verify.
All features follow Red, Green, Refactor TDD. Write a failing test first, implement the minimum code to pass it, then clean up.