Skip to content

WIP: Fix Various JSON-Schema Generation Bugs#88

Open
lapp0 wants to merge 1 commit into
mainfrom
fix-json
Open

WIP: Fix Various JSON-Schema Generation Bugs#88
lapp0 wants to merge 1 commit into
mainfrom
fix-json

Conversation

@lapp0
Copy link
Copy Markdown
Owner

@lapp0 lapp0 commented Aug 31, 2024

Overview

The repetition problem of language models combined with patterns allowing for infinite-length fields results in broken JSON Schema outputs.

This was addressed previously for infinite whitespaces issues by setting a safe whitespace pattern as the default. In this PR, the safety of whitespaces is extended to Integer and String patterns.

Behavior

json_schema.to_regex now includes an kwarg safe_subset=True.

safe_subset=False

  • Whitespace: r"[\n\t ]*"
  • Integer: any number
  • String: any string

safe_subset=True (default)

  • Whitespace: r"[ ]?"
  • Integer: (-1e19, 1e19)
  • String: Any string of length (0, 256)

Fixes

Safe Integer

Safe String

Further Work

@torchss
Copy link
Copy Markdown

torchss commented Jun 22, 2025

This looks fantastic - are there pieces in this PR that can be merged quickly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants