leaxer-qwen3-tts

Single binary, C++ implementation of Qwen3-TTS running on top of ONNX Runtime.

Usage

leaxer-qwen3-tts -m <model_dir> -p "Hello world" -o output.wav

# With language hint
leaxer-qwen3-tts -m onnx_kv_06b -p "你好世界" --lang zh -o chinese.wav

# Sampling controls
leaxer-qwen3-tts -m onnx_kv_06b -p "Hello" --temp 0.7 --top-k 30 --top-p 0.9

Options

Flag	Description	Default
`-m, --model`	ONNX model directory	required
`-p, --prompt`	Text to synthesize	required
`-o, --output`	Output WAV path	`output.wav`
`--lang`	Language: `auto`, `en`, `zh`, `ja`, `ko`	`auto`
`--temp`	Sampling temperature	`0.8`
`--top-k`	Top-k sampling	`50`
`--top-p`	Top-p (nucleus) sampling	`0.95`
`--max-tokens`	Max generation tokens	`2048`

Building

Requirements

CMake 3.14+
C++17 compiler
ONNX Runtime (user-provided)

Build

# Download ONNX Runtime from https://github.com/microsoft/onnxruntime/releases
# Extract to a directory, then:

cmake -B build -DONNXRUNTIME_DIR=/path/to/onnxruntime
cmake --build build -j

./build/leaxer-qwen3-tts --help

macOS (Homebrew)

brew install onnxruntime cmake
cmake -B build -DONNXRUNTIME_DIR=$(brew --prefix onnxruntime)
cmake --build build -j

GPU Acceleration

# CoreML (macOS, requires ONNX Runtime with CoreML support)
cmake -B build -DONNXRUNTIME_DIR=/path/to/onnxruntime -DLEAXER_COREML=ON

# CUDA (NVIDIA)
cmake -B build -DONNXRUNTIME_DIR=/path/to/onnxruntime -DLEAXER_CUDA=ON

# ROCm (AMD)
cmake -B build -DONNXRUNTIME_DIR=/path/to/onnxruntime -DLEAXER_ROCM=ON

Note: GPU acceleration requires ONNX Runtime built with the corresponding execution provider. The default Microsoft releases include CoreML for macOS.

Models

Download ONNX models from zukky/Qwen3-TTS-ONNX-DLL:

# Clone model repo (or download manually)
git lfs install
git clone https://huggingface.co/zukky/Qwen3-TTS-ONNX-DLL models

# Use the 0.6B model
./build/leaxer-qwen3-tts -m onnx/onnx_kv_06b -p "Hello"

Required files in model directory:

text_project.onnx — text token embeddings
codec_embed.onnx — codec token embeddings
code_predictor_embed.onnx — sub-codec embeddings
talker_prefill.onnx — transformer prefill
talker_decode.onnx — transformer decode (with KV cache)
code_predictor.onnx — predict codebooks 1-15
tokenizer12hz_decode.onnx — vocoder (codes → audio)

Also needs tokenizer files in ../models/Qwen3-TTS-12Hz-0.6B-Base/:

vocab.json
merges.txt

Architecture

Text → BPE Tokenizer → Talker (prefill/decode) → Code Predictor → Vocoder → WAV
                              ↓                        ↓
                           KV Cache              Codebooks 1-15

The model generates 16 audio codebooks per frame at 12Hz, then the vocoder upsamples to 24kHz audio.

Voice Clone

Clone any voice from a 3-second reference audio:

leaxer-qwen3-tts -m onnx/onnx_kv_06b -p "Hello world" --ref voice_sample.wav -o output.wav

The reference audio should be:

WAV format (8/16/24/32-bit PCM or float)
At least 3 seconds of clear speech
Will be resampled to 24kHz internally

Roadmap

Feature	Requires	Status
Voice clone (`--ref`)	0.6B-Base	Done
GPU acceleration	ONNX Runtime + CoreML/CUDA	Done
Preset speakers (`--speaker`)	0.6B-CustomVoice	Planned
Voice instructions (`--instruct`)	1.7B-VoiceDesign	Planned
Single static binary	Static ONNX Runtime	Planned

Credits

Qwen3-TTS by Alibaba
zukky/Qwen3-TTS-ONNX-DLL for ONNX exports, big thanks to Mr. Daishi Suzuki (zukky)!

License

Apache 2.0 — see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

leaxer-qwen3-tts

Usage

Options

Building

Requirements

Build

macOS (Homebrew)

GPU Acceleration

Models

Required files in model directory:

Architecture

Voice Clone

Roadmap

Credits

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

leaxer-qwen3-tts

Usage

Options

Building

Requirements

Build

macOS (Homebrew)

GPU Acceleration

Models

Required files in model directory:

Architecture

Voice Clone

Roadmap

Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages