MLLMScheduler silently ignores frequency_penalty / presence_penalty / repetition_penalty on vision models

## Summary

`MLLMScheduler` (the scheduler for vision/multimodal models — gemma-4-12b, qwen-vl, etc.) has **zero penalty wiring**. All three OpenAI-compatible penalty params are silently ignored on requests routed to it.

```bash
$ grep -c "make_logits_processor\|presence_penalty\|frequency_penalty\|repetition_penalty" vllm_mlx/mllm_scheduler.py
0
```

Surfaced by the round-2 codex review of PR #510. Pre-existing — not introduced by the #470 fix.

## Repro

Send `frequency_penalty=2.0` to any vision model (e.g. `gemma-4-12b`) via `/v1/chat/completions`. The penalty has no observable effect on output — same content with or without the param. The text-model scheduler (`vllm_mlx/scheduler.py`) correctly applies penalties since #355 + #510; vision models silently drop them.

## Root cause

`vllm_mlx/mllm_scheduler.py` has three blockers, all visible in current `main`:

1. **Hardcoded sampler** (line 286-287): a TODO comment from the original author already acknowledges the gap:
   ```python
   sampler = make_sampler(temp=0.7, top_p=0.9)
   # Default sampler (can be overridden per-request in future)
   ```
2. **API signature only accepts `temperature` + `top_p`** (lines 309-310, 921-922) — the penalty fields aren't surfaced as scheduler-level kwargs.
3. **No `make_logits_processors` call** anywhere in the file — the text-scheduler's whole penalty-wiring block (`vllm_mlx/scheduler.py:2783-2814`) has no analogue here.

## Proposed fix (sketch — 3 layers)

1. `routes/chat.py` MLLM branch: forward `presence_penalty / frequency_penalty / repetition_penalty` into the scheduler call (analogous to text branch).
2. `MLLMScheduler.__init__` / generate API: accept the three penalty fields, store on `SamplingParams`.
3. Replace the hardcoded `make_sampler(...)` with a per-request sampler + `logits_processors` extended with `make_logits_processors(...)` matching `scheduler.py:2783-2814` (including `presence_context_size` / `frequency_context_size` matching the text-scheduler value).

## Priority

Low. No user reports — vision-model users with `frequency_penalty` are a small subset. Filing for visibility so it gets picked up in a future MLLM refactor, or by anyone hitting the silent no-op.

## Related

- #470 (text-model version of this bug, fixed in v0.6.74 via #510)
- #355 (extended sampling passthrough on text scheduler)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLLMScheduler silently ignores frequency_penalty / presence_penalty / repetition_penalty on vision models #512

Summary

Repro

Root cause

Proposed fix (sketch — 3 layers)

Priority

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

MLLMScheduler silently ignores frequency_penalty / presence_penalty / repetition_penalty on vision models #512

Description

Summary

Repro

Root cause

Proposed fix (sketch — 3 layers)

Priority

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions