Skip to content

MLLMScheduler silently ignores frequency_penalty / presence_penalty / repetition_penalty on vision models #512

@raullenchai

Description

@raullenchai

Summary

MLLMScheduler (the scheduler for vision/multimodal models — gemma-4-12b, qwen-vl, etc.) has zero penalty wiring. All three OpenAI-compatible penalty params are silently ignored on requests routed to it.

$ grep -c "make_logits_processor\|presence_penalty\|frequency_penalty\|repetition_penalty" vllm_mlx/mllm_scheduler.py
0

Surfaced by the round-2 codex review of PR #510. Pre-existing — not introduced by the #470 fix.

Repro

Send frequency_penalty=2.0 to any vision model (e.g. gemma-4-12b) via /v1/chat/completions. The penalty has no observable effect on output — same content with or without the param. The text-model scheduler (vllm_mlx/scheduler.py) correctly applies penalties since #355 + #510; vision models silently drop them.

Root cause

vllm_mlx/mllm_scheduler.py has three blockers, all visible in current main:

  1. Hardcoded sampler (line 286-287): a TODO comment from the original author already acknowledges the gap:
    sampler = make_sampler(temp=0.7, top_p=0.9)
    # Default sampler (can be overridden per-request in future)
  2. API signature only accepts temperature + top_p (lines 309-310, 921-922) — the penalty fields aren't surfaced as scheduler-level kwargs.
  3. No make_logits_processors call anywhere in the file — the text-scheduler's whole penalty-wiring block (vllm_mlx/scheduler.py:2783-2814) has no analogue here.

Proposed fix (sketch — 3 layers)

  1. routes/chat.py MLLM branch: forward presence_penalty / frequency_penalty / repetition_penalty into the scheduler call (analogous to text branch).
  2. MLLMScheduler.__init__ / generate API: accept the three penalty fields, store on SamplingParams.
  3. Replace the hardcoded make_sampler(...) with a per-request sampler + logits_processors extended with make_logits_processors(...) matching scheduler.py:2783-2814 (including presence_context_size / frequency_context_size matching the text-scheduler value).

Priority

Low. No user reports — vision-model users with frequency_penalty are a small subset. Filing for visibility so it gets picked up in a future MLLM refactor, or by anyone hitting the silent no-op.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions