Skip to content

feat: support page numbers in Markdown page breaks#639

Open
gyx09212214-prog wants to merge 2 commits into
docling-project:mainfrom
gyx09212214-prog:codex/page-break-placeholder-pages
Open

feat: support page numbers in Markdown page breaks#639
gyx09212214-prog wants to merge 2 commits into
docling-project:mainfrom
gyx09212214-prog:codex/page-break-placeholder-pages

Conversation

@gyx09212214-prog

Copy link
Copy Markdown

Summary

Adds named page-number placeholders for Markdown page_break_placeholder values.

Today MarkdownParams(page_break_placeholder="<!-- page {page_no} -->") emits {page_no} literally. The internal page-break marker already carries both previous and next page numbers, so this change formats the placeholder when the final Markdown text is assembled.

Supported placeholders:

  • {page_no}: the page after the break, matching the common “page marker before page N” use case
  • {next_page}: same value as {page_no}
  • {prev_page}: the page before the break

Existing literal placeholders such as "<!-- page break -->", "", and None keep their current behavior.

This helps document/RAG pipelines preserve page-level provenance in Markdown exports, which is useful for citations and traceable retrieval chunks. It also addresses the page-number placeholder part of docling-project/docling#2036.

Tests

  • python -m pytest test/test_serialization.py -q (48 passed)
  • python -m py_compile docling_core/transforms/serializer/markdown.py test/test_serialization.py
  • git diff --check
  • python -m ruff check docling_core/transforms/serializer/markdown.py test/test_serialization.py

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

DCO Check Passed

Thanks @gyx09212214-prog, all your commits are properly signed off. 🎉

@mergify

mergify Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviewer for test updates

Waiting for

  • #approved-reviews-by >= 2
This rule is failing.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

…og@users.noreply.github.com>

I, gyx09212214-prog <243787584+gyx09212214-prog@users.noreply.github.com>, hereby add my Signed-off-by to this commit: 8d675d4

Signed-off-by: gyx09212214-prog <243787584+gyx09212214-prog@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant