guided generation: `additionalProperties: false` not strictly enforced under adversarial prompts

## Summary

`response_format: json_schema` with `additionalProperties: false` does not strictly prevent the model from emitting extra keys when an adversarial prompt tempts it. Surfaced as Gap #1 during the v0.6.60 PyPI onboarding sweep (2026-05-20).

**Severity**: low. Well-behaved prompts stay within the declared keys (test_11 and 7/10 onboarding scenarios already pass on v0.6.60). The leak only appears when the user prompt *explicitly* asks for fields the schema doesn't declare. Workaround exists.

## Repro (against v0.6.60 PyPI wheel, agent #7 of the onboarding sweep)

```python
schema = {
    "type": "object",
    "properties": {"name": {"type": "string"}, "email": {"type": "string"}},
    "required": ["name", "email"],
    "additionalProperties": False,
}
# Adversarial prompt — explicitly asks for extras
prompt = "Describe a user. Include name, email, age, address, phone, occupation, and any extra metadata."
# Result: model emits {"name": ..., "email": ..., "age": 30, "address": ..., "phone": ..., "occupation": ...}
# jsonschema.validate rejects the response.
```

## What we already know

Outlines' regex generation for this schema is **correct**:

```python
>>> from outlines.types.dsl import JsonSchema, to_regex
>>> to_regex(JsonSchema(schema))
'(\\{[ ]?"name"[ ]?:[ ]?"(...)*"[ ]?,[ ]?"email"[ ]?:[ ]?"(...)*"[ ]?\\})'
```

The regex only allows `{"name":"...","email":"..."}` — no room for extras. So if the model emits extras, **either**:

1. The regex is generated correctly but isn't actually constraining sampling (most likely — outlines+mlxlm integration layer bug).
2. The constraint is being applied to only part of the generation (e.g., released after the first `}`).
3. The adversarial prompt manages to flip the JSON to a non-object form the regex didn't pin down (less likely — top-level is `{`).

PR #419 already verified the **schema dict reaches `outlines.types.dsl.JsonSchema` intact** (no lossy projection through `json_schema_to_pydantic`). So the bug is downstream of `JsonSchema(schema)`, somewhere in the outlines→mlxlm sampling path.

## Why this is not urgent

- `additionalProperties: false` is correctly enforced when the model isn't being adversarially pushed off-schema (every passing onboarding scenario and `test_11` confirm this).
- Users hitting this in production would already have validation logic on their side (because schema enforcement was historically best-effort).
- The well-known workaround is to combine `response_format: json_schema` with an explicit "do not emit any other fields" instruction in the system prompt for strict-mode use cases.

## Investigation steps (when picked up)

1. **Live repro against the current server** to confirm the bug still exists post-PR #422 (the streaming fix landed, this is the non-streaming path).
2. **Trace the constraint into outlines**: instrument `_run_guided_generation` to log the regex/grammar object outlines actually uses, then inspect whether the FSM transitions allow extra keys.
3. If the bug is in outlines, file upstream. Add a workaround layer in `vllm_mlx/api/guided.py` (e.g., post-generation `jsonschema.validate` with a single retry attempt, gated on a flag).
4. **Adversarial test gate**: extend `regression_suite.test_11` with the agent #7 prompt so future regressions trip the doctor harness `check` tier.

## Related

- Gap #2 (streaming bypass) — closed by PR #422 (merged 2026-05-20).
- Schema passthrough — closed by PR #419 (merged 2026-05-20).
- See [knowledge/guided_generation_gaps_2026-05-20.md](https://github.com/raullenchai/Rapid-MLX/blob/main/...) (local notes; not in repo).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

guided generation: `additionalProperties: false` not strictly enforced under adversarial prompts #423

Summary

Repro (against v0.6.60 PyPI wheel, agent #7 of the onboarding sweep)

What we already know

Why this is not urgent

Investigation steps (when picked up)

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

guided generation: additionalProperties: false not strictly enforced under adversarial prompts #423

Description

Summary

Repro (against v0.6.60 PyPI wheel, agent #7 of the onboarding sweep)

What we already know

Why this is not urgent

Investigation steps (when picked up)

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

guided generation: `additionalProperties: false` not strictly enforced under adversarial prompts #423