Skip to content

Add Apple Silicon MLX inference backend#158

Open
ailuntx wants to merge 4 commits into
k2-fsa:masterfrom
ailuntx:add-mlx-backend
Open

Add Apple Silicon MLX inference backend#158
ailuntx wants to merge 4 commits into
k2-fsa:masterfrom
ailuntx:add-mlx-backend

Conversation

@ailuntx

@ailuntx ailuntx commented May 15, 2026

Copy link
Copy Markdown
Contributor

This PR adds an experimental Apple Silicon / MLX inference backend for OmniVoice.

Changes:

  • Adds omnivoice.mlx.OmniVoiceMLX with an MLX implementation of the OmniVoice Qwen3-style inference path.
  • Adds omnivoice-infer-mlx CLI.
  • Adds optional mlx dependency group.
  • Adds conversion, quantization, staging, and validation scripts under scripts/.
  • Documents local MLX usage in the README.

Validation performed locally on Apple Silicon:

  • Official PyTorch/MPS baseline generated short and normal samples from the official checkpoint.
  • MLX backend generated short and normal samples from the same official checkpoint.
  • Forward-logits alignment vs PyTorch/MPS on a prepared input: shape (1, 8, 17, 1025), identical target-tail argmax tokens, mean abs diff about 0.005.
  • Staged MLX variants tested locally: fp32, bf16, 8bit, 4bit.

Related tracking issue: #157

Community runtime/staging mirror: https://github.com/ailuntx/OmniVoice-MLX

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant