You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After PR #462 ("forward stop to engine"), Anthropic /v1/messages accepts stop_sequences but the engine does NOT actually truncate at the requested sequence. Additionally, the response always returns stop_reason: "end_turn" and stop_sequence: null — even when the model spontaneously emits the requested sequence in its output.
Repro
curl -X POST http://127.0.0.1:8501/v1/messages \
-H "Authorization: Bearer SEKRET" -H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{"model":"unsloth/Qwen3.6-27B-MLX-8bit","messages":[{"role":"user","content":"List letters A through F, separated by spaces."}],"stop_sequences":["STOP"],"max_tokens":50}'
Expected: response truncated at STOP, stop_reason: "stop_sequence", stop_sequence: "STOP".
Actual: model emits "A B C STOP D E F" (continues PAST the literal STOP it just generated), stop_reason: "end_turn", stop_sequence: null.
Severity
HIGH — two stacked breakages:
Enforcement: stop_sequences from client are ignored by the engine. Routes/anthropic.py:_resolved_sampling_kwargs forwards stop (fix(anthropic): forward stop_sequences to engine on /v1/messages #462) but the engine's streaming loop / sampler isn't actually checking against the list per-token.
Summary
After PR #462 ("forward stop to engine"), Anthropic
/v1/messagesacceptsstop_sequencesbut the engine does NOT actually truncate at the requested sequence. Additionally, the response always returnsstop_reason: "end_turn"andstop_sequence: null— even when the model spontaneously emits the requested sequence in its output.Repro
Expected: response truncated at
STOP,stop_reason: "stop_sequence",stop_sequence: "STOP".Actual: model emits
"A B C STOP D E F"(continues PAST the literal STOP it just generated),stop_reason: "end_turn",stop_sequence: null.Severity
HIGH — two stacked breakages:
_resolved_sampling_kwargsforwardsstop(fix(anthropic): forward stop_sequences to engine on /v1/messages #462) but the engine's streaming loop / sampler isn't actually checking against the list per-token.stop_reason="stop_sequence"/stop_sequence="STOP"— memory note flagged this as Phase 2 follow-up to fix(anthropic): forward stop_sequences to engine on /v1/messages #462.Likely scope
request.stopagainst generated text per-token (or per-N-tokens) and terminate generation when match foundfinish_reason="stop"+ matching sequence →stop_reason: "stop_sequence"+stop_sequence: "<matched>"Same "param declared but not enforced" family as #459/#464/#465/#468 but at the engine layer.
Surfaced by iter15 onboarding sweep (hybrid agent).