Skip to content

Fix invalid JSON formatting in LLM prompts causing KeyErrors #257

@Himanshuwagh

Description

@Himanshuwagh

Several LLM prompts in page_index.py define a JSON structure for the model to follow but are missing commas between keys (specifically after the "thinking" key).

When the LLM follows these instructions literally, it produces invalid JSON. This causes the extract_json function to fail or return an empty dictionary, leading to KeyError exceptions when the code attempts to access keys like toc_detected, completed, or page_index_given_in_toc.

The following lines in pageindex/page_index.py are missing a trailing comma after the "thinking" value placeholder:

Line 34: inside check_title_appearance
Line 62: inside check_title_appearance_in_start
Line 112: inside toc_detector_single_page
Line 132: inside check_if_toc_extraction_is_complete
Line 150: inside check_if_toc_transformation_is_complete
Line 213: inside detect_page_index

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions