Add Apple Silicon MLX inference backend by ailuntx · Pull Request #158 · k2-fsa/OmniVoice

ailuntx · 2026-05-15T09:48:55Z

This PR adds an experimental Apple Silicon / MLX inference backend for OmniVoice.

Changes:

Adds omnivoice.mlx.OmniVoiceMLX with an MLX implementation of the OmniVoice Qwen3-style inference path.
Adds omnivoice-infer-mlx CLI.
Adds optional mlx dependency group.
Adds conversion, quantization, staging, and validation scripts under scripts/.
Documents local MLX usage in the README.

Validation performed locally on Apple Silicon:

Official PyTorch/MPS baseline generated short and normal samples from the official checkpoint.
MLX backend generated short and normal samples from the same official checkpoint.
Forward-logits alignment vs PyTorch/MPS on a prepared input: shape (1, 8, 17, 1025), identical target-tail argmax tokens, mean abs diff about 0.005.
Staged MLX variants tested locally: fp32, bf16, 8bit, 4bit.

Related tracking issue: #157

ailuntz added 4 commits May 15, 2026 17:39

Add OmniVoice MLX inference and staging tools

df4d716

Update MLX staging install instructions

bb7c1ad

Use mlx-community repo names without MLX suffix

21de413

Use bfloat16 repo name for MLX staging

93695f5

Provide feedback