Legacy /v1/completions + boundary cases: 5 issues surfaced in iter7 onboarding sweep

Found during the recurring babysit onboarding sweep on `qwen3.6-27b-8bit` (PyPI `rapid-mlx 0.6.66`). Filing as one consolidated tracker since they cluster around the legacy `/v1/completions` endpoint + request-boundary validation. Hybrid model with channel-routed chat works fine; legacy/boundary surface is where the gaps live.

## 1. `/v1/completions` `echo:true` silently ignored
```bash
curl -X POST .../v1/completions -d '{"model":"qwen3.6-27b-8bit","prompt":"Once upon a time","max_tokens":15,"echo":true}'
```
Expected: response text begins with `"Once upon a time"`. Actual: starts mid-continuation. `echo` accepted by schema but never honored.

## 2. `/v1/completions` `logprobs` schema mismatch with OpenAI
OpenAI spec: `logprobs: int (0-5)`. rapid-mlx declares `logprobs: bool`. Sending `"logprobs":3` (the canonical OpenAI form) → HTTP 422 `bool_parsing`. Sending `"logprobs":true` is accepted but no `logprobs` field appears in response — double-ignored.

## 3. `/v1/completions` streaming per-chunk id rotation
Each SSE `data:` chunk gets a fresh UUID (`cmpl-d62f...`, `cmpl-8f7b...`) instead of sharing one stream id. OpenAI streaming spec requires all chunks of one completion to share `id`. Clients that key on `id` for dedup/aggregation will break.

## 4. `n=0` accepted as `n=1` silently
Boundary validation gap. `n=0` should be rejected (Pydantic `ge=1`), but route accepts it and returns 1 choice. `n=null`/`n=1` correct; `n=2+` correctly 400s.

## 5. qwen3.6 default thinking-marker leak in `content`
Default request (no `enable_thinking` flag) returns `content` like `"Here's a thinking process:\n\n1. Analyze..."` with no `reasoning_content` populated. Explicit `enable_thinking:true` parses correctly. Either qwen3.6's default differs from qwen3.5's, or our alias `recommended_template_kwargs` is omitting `enable_thinking`. UX hit — default user sees raw reasoning bleeding into content.

---

Out of scope for the in-flight fix (#460 is harmony-specific channel-routing); filing as a consolidated tracker for future iterations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Legacy /v1/completions + boundary cases: 5 issues surfaced in iter7 onboarding sweep #461

1. `/v1/completions` `echo:true` silently ignored

2. `/v1/completions` `logprobs` schema mismatch with OpenAI

3. `/v1/completions` streaming per-chunk id rotation

4. `n=0` accepted as `n=1` silently

5. qwen3.6 default thinking-marker leak in `content`

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Legacy /v1/completions + boundary cases: 5 issues surfaced in iter7 onboarding sweep #461

Description

1. /v1/completions echo:true silently ignored

2. /v1/completions logprobs schema mismatch with OpenAI

3. /v1/completions streaming per-chunk id rotation

4. n=0 accepted as n=1 silently

5. qwen3.6 default thinking-marker leak in content

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. `/v1/completions` `echo:true` silently ignored

2. `/v1/completions` `logprobs` schema mismatch with OpenAI

3. `/v1/completions` streaming per-chunk id rotation

4. `n=0` accepted as `n=1` silently

5. qwen3.6 default thinking-marker leak in `content`