Skip to content

Stage 2 checker parse path also needs extract_json_object (companion to #41) #48

@RichardHightower

Description

@RichardHightower

Symptom

Real-eval after PR #44 fix (issue #41): `doer_content_retries` dropped to 0% on Stage 2 and 88% on Stage 3, but `checker_parse_retries` was still 43 on Stage 3 (run #4) and 3 on Stage 3 (run #5). Each one is the LLM-checker emitting JSON that fails `parse_verdict`.

Root cause — asymmetric fix

PR #44 added `extract_json_object` and applied it to the doer path (`doer_schema_loop` in loop.py). The checker path (`parse_verdict` in verdict.py) still uses only `_strip_code_fence`, which is the strict whole-string-fence helper that doesn't catch prose preamble or trailing commentary.

`verdict.py:94`:

```python
data = json.loads(_strip_code_fence(raw))
```

Same LLM, same output deviations, different parser tolerance. The checker has the same need for tolerant extraction.

Proposed fix

```python

verdict.py:parse_verdict

data = json.loads(extract_json_object(raw))
```

Symmetric extension of #41's fix. The new helper is non-destructive — clean checker output passes through, fenced/preambled output is rescued.

Tests

Add to `tests/unit/test_verdict.py`: parse_verdict on fenced JSON, prose preamble, etc., should produce a real (non-synthetic-fail) verdict.

Closing criteria

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions