Skip to content

guided generation: additionalProperties: false not strictly enforced under adversarial prompts #423

@raullenchai

Description

@raullenchai

Summary

response_format: json_schema with additionalProperties: false does not strictly prevent the model from emitting extra keys when an adversarial prompt tempts it. Surfaced as Gap #1 during the v0.6.60 PyPI onboarding sweep (2026-05-20).

Severity: low. Well-behaved prompts stay within the declared keys (test_11 and 7/10 onboarding scenarios already pass on v0.6.60). The leak only appears when the user prompt explicitly asks for fields the schema doesn't declare. Workaround exists.

Repro (against v0.6.60 PyPI wheel, agent #7 of the onboarding sweep)

schema = {
    "type": "object",
    "properties": {"name": {"type": "string"}, "email": {"type": "string"}},
    "required": ["name", "email"],
    "additionalProperties": False,
}
# Adversarial prompt — explicitly asks for extras
prompt = "Describe a user. Include name, email, age, address, phone, occupation, and any extra metadata."
# Result: model emits {"name": ..., "email": ..., "age": 30, "address": ..., "phone": ..., "occupation": ...}
# jsonschema.validate rejects the response.

What we already know

Outlines' regex generation for this schema is correct:

>>> from outlines.types.dsl import JsonSchema, to_regex
>>> to_regex(JsonSchema(schema))
'(\\{[ ]?"name"[ ]?:[ ]?"(...)*"[ ]?,[ ]?"email"[ ]?:[ ]?"(...)*"[ ]?\\})'

The regex only allows {"name":"...","email":"..."} — no room for extras. So if the model emits extras, either:

  1. The regex is generated correctly but isn't actually constraining sampling (most likely — outlines+mlxlm integration layer bug).
  2. The constraint is being applied to only part of the generation (e.g., released after the first }).
  3. The adversarial prompt manages to flip the JSON to a non-object form the regex didn't pin down (less likely — top-level is {).

PR #419 already verified the schema dict reaches outlines.types.dsl.JsonSchema intact (no lossy projection through json_schema_to_pydantic). So the bug is downstream of JsonSchema(schema), somewhere in the outlines→mlxlm sampling path.

Why this is not urgent

  • additionalProperties: false is correctly enforced when the model isn't being adversarially pushed off-schema (every passing onboarding scenario and test_11 confirm this).
  • Users hitting this in production would already have validation logic on their side (because schema enforcement was historically best-effort).
  • The well-known workaround is to combine response_format: json_schema with an explicit "do not emit any other fields" instruction in the system prompt for strict-mode use cases.

Investigation steps (when picked up)

  1. Live repro against the current server to confirm the bug still exists post-PR fix(chat): route stream=true + json_schema through guided generation #422 (the streaming fix landed, this is the non-streaming path).
  2. Trace the constraint into outlines: instrument _run_guided_generation to log the regex/grammar object outlines actually uses, then inspect whether the FSM transitions allow extra keys.
  3. If the bug is in outlines, file upstream. Add a workaround layer in vllm_mlx/api/guided.py (e.g., post-generation jsonschema.validate with a single retry attempt, gated on a flag).
  4. Adversarial test gate: extend regression_suite.test_11 with the agent feat: TTFT cache fix, MiniMax reasoning parser, logprobs API, tool logits #7 prompt so future regressions trip the doctor harness check tier.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions