Fix invalid JSON formatting in LLM prompts causing KeyErrors

Several LLM prompts in page_index.py define a JSON structure for the model to follow but are missing commas between keys (specifically after the "thinking" key).

When the LLM follows these instructions literally, it produces invalid JSON. This causes the extract_json function to fail or return an empty dictionary, leading to KeyError exceptions when the code attempts to access keys like toc_detected, completed, or page_index_given_in_toc.

The following lines in pageindex/page_index.py are missing a trailing comma after the "thinking" value placeholder:

Line 34: inside check_title_appearance
Line 62: inside check_title_appearance_in_start
Line 112: inside toc_detector_single_page
Line 132: inside check_if_toc_extraction_is_complete
Line 150: inside check_if_toc_transformation_is_complete
Line 213: inside detect_page_index

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix invalid JSON formatting in LLM prompts causing KeyErrors #257

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fix invalid JSON formatting in LLM prompts causing KeyErrors #257

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions