Skip to content

XWPFDocument#setParagraph causes inconsistency between bodyElements and paragraphs #980

@ybg163yx

Description

@ybg163yx

Description

Calling XWPFDocument#setParagraph(XWPFParagraph paragraph, int pos) may cause
an inconsistency between the internal bodyElements and paragraphs lists.

After calling setParagraph, the element stored at the same position in
bodyElements and paragraphs may no longer refer to the same paragraph
instance.

Impact

This inconsistency breaks XWPFDocument#removeBodyElement(int pos).

removeBodyElement relies on getParagraphPos(int bodyPos) to locate the
corresponding paragraph index. When the paragraph was previously replaced via
setParagraph, getParagraphPos may return -1, causing
paragraphs.remove(paraPos) to fail with an exception.

Root Cause Analysis

In setParagraph, two different update mechanisms are used:

  • The paragraphs list is updated via ArrayList#set, directly replacing the
    paragraph reference.
  • The underlying XML (CTDocument) is updated via
    ctDocument.getBody().setPArray(...).

During XML processing, the generated XMLBeans code eventually calls
XObj.copy_contents_from, which copies the XML contents instead of
reusing the existing CTP / XWPFParagraph instance.

As a result, the paragraph object referenced by paragraphs differs from the
one created and stored in bodyElements, leading to inconsistent internal
state.

Steps to Reproduce

A sample DOCX file is attached.

public static void main(String[] args) throws IOException {
    FileInputStream fis =
        new FileInputStream("test_1989242873218412545.docx");

    try (XWPFDocument document = new XWPFDocument(fis)) {
        List<XWPFParagraph> paragraphs = document.getParagraphs();
        document.setParagraph(paragraphs.get(5), 6);

        // For debugging: inspect internal state after setParagraph
        System.out.println("--");
    }
}
Expected Behavior

After calling setParagraph, the internal bodyElements and paragraphs
collections should remain consistent, and subsequent calls to
removeBodyElement should work correctly.

Actual Behavior

bodyElements and paragraphs become inconsistent, causing
removeBodyElement to fail when removing a paragraph.

Additional Information

I have identified the cause and implemented a local fix.
A Pull Request will be submitted shortly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions