MOSS-TTS-Nano

MOSS-TTS-Nano is an open-source multilingual tiny speech generation model from MOSI.AI and the OpenMOSS team. With only 0.1B parameters, it is designed for realtime speech generation, can run directly on CPU without a GPU, and keeps the deployment stack simple enough for local demos, web serving, and lightweight product integration.

News

2026.4.10: We release MOSS-TTS-Nano. A demo Space is available at OpenMOSS-Team/MOSS-TTS-Nano.

Demo

Online Demo: https://openmoss.github.io/MOSS-TTS-Nano-Demo/
Hugging Face Space: OpenMOSS-Team/MOSS-TTS-Nano

Introduction

MOSS-TTS-Nano focuses on the part of TTS deployment that matters most in practice: small footprint, low latency, good enough quality for realtime products, and simple local setup. It uses a pure autoregressive Audio Tokenizer + LLM pipeline and keeps the inference workflow friendly for both terminal users and web-demo users.

Main Features

Tiny model size: only 0.1B parameters
Native audio format: 48 kHz, 2-channel output
Multilingual: supports Chinese, English, and more
Pure autoregressive architecture: built on Audio Tokenizer + LLM
Streaming inference: low realtime latency and fast first audio
CPU friendly: streaming generation can run on a 4-core CPU
Long-text capable: supports long input with automatic chunked voice cloning
Open-source deployment: direct python infer.py, python app.py, and packaged CLI support

Supported Languages

MOSS-TTS-Nano currently supports 20 languages:

Language	Code	Flag	Language	Code	Flag	Language	Code	Flag
Chinese	zh	🇨🇳	English	en	🇺🇸	German	de	🇩🇪
Spanish	es	🇪🇸	French	fr	🇫🇷	Japanese	ja	🇯🇵
Italian	it	🇮🇹	Hungarian	hu	🇭🇺	Korean	ko	🇰🇷
Russian	ru	🇷🇺	Persian (Farsi)	fa	🇮🇷	Arabic	ar	🇸🇦
Polish	pl	🇵🇱	Portuguese	pt	🇵🇹	Czech	cs	🇨🇿
Danish	da	🇩🇰	Swedish	sv	🇸🇪	Greek	el	🇬🇷
Turkish	tr	🇹🇷

Quickstart

Environment Setup

We recommend a clean Python environment first, then installing the project in editable mode so the moss-tts-nano command becomes available locally. The examples below intentionally keep arguments minimal and rely on the repository defaults. By default, the code loads OpenMOSS-Team/MOSS-TTS-Nano and OpenMOSS-Team/MOSS-Audio-Tokenizer-Nano.

Using Conda

conda create -n moss-tts-nano python=3.12 -y
conda activate moss-tts-nano

git clone https://github.com/OpenMOSS/MOSS-TTS-Nano.git
cd MOSS-TTS-Nano

pip install -r requirements.txt
pip install -e .

If WeTextProcessing fails to install from requirements.txt, try installing it manually in the same environment:

conda install -c conda-forge pynini=2.1.6.post1 -y
pip install git+https://github.com/WhizZest/WeTextProcessing.git

Voice Clone with `infer.py`

This repository keeps the direct Python entrypoint for local inference. The example below uses voice clone mode, which is the main recommended workflow for MOSS-TTS-Nano.

python infer.py \
  --prompt-audio-path assets/audio/zh_1.wav \
  --text "欢迎关注模思智能、上海创智学院与复旦大学自然语言处理实验室。"

This writes audio to generated_audio/infer_output.wav by default.

Local Web Demo with `app.py`

You can launch the local FastAPI demo for browser-based testing:

python app.py

Then open http://127.0.0.1:18083 in your browser.

CLI Command: `moss-tts-nano generate`

After pip install -e ., you can call the packaged CLI directly:

moss-tts-nano generate \
  --prompt-speech assets/audio/zh_1.wav \
  --text "欢迎关注模思智能、上海创智学院与复旦大学自然语言处理实验室。"

Useful notes:

moss-tts-nano generate writes to generated_audio/moss_tts_nano_output.wav by default.
--prompt-speech is the friendly alias for the reference audio path used by voice cloning.
--text-file is supported for long-form synthesis.

CLI Command: `moss-tts-nano serve`

You can also launch the web demo through the packaged CLI:

moss-tts-nano serve

This command forwards to app.py, keeps the model loaded in memory, and serves the local browser demo plus HTTP generation endpoints.

MOSS-Audio-Tokenizer-Nano

Introduction

MOSS-Audio-Tokenizer is the unified discrete audio interface for the entire MOSS-TTS family. It is built on the Cat (Causal Audio Tokenizer with Transformer) architecture, a CNN-free audio tokenizer composed entirely of causal Transformer blocks. It serves as the shared audio backbone for MOSS-TTS, MOSS-TTS-Nano, MOSS-TTSD, MOSS-VoiceGenerator, MOSS-SoundEffect, and MOSS-TTS-Realtime, providing a consistent audio representation across the full product family.

To further improve perceptual quality while reducing inference cost, we trained MOSS-Audio-Tokenizer-Nano, a lightweight tokenizer with approximately 20 million parameters designed for high-fidelity audio compression. It supports 48 kHz input and output as well as stereo audio, which helps reduce compression loss and improve listening quality. It can compress 48 kHz stereo audio into a 12.5 Hz token stream and uses RVQ with 16 codebooks, enabling high-fidelity reconstruction across variable bitrates from 0.125 kbps to 4 kbps.

To learn more about setup, advanced usage, and evaluation metrics, please visit the MOSS-Audio-Tokenizer Repository

Architecture of MOSS-Audio-Tokenizer-Nano

Model Weights

Model	Hugging Face	ModelScope
MOSS-Audio-Tokenizer-Nano

License

This repository will follow the license specified in the root LICENSE file. If you are reading this before that file is published, please treat the repository as not yet licensed for redistribution.

Citation

If you use the MOSS-TTS work in your research or product, please cite:

@misc{openmoss2026mossttsnano,
  title={MOSS-TTS-Nano},
  author={OpenMOSS Team},
  year={2026},
  howpublished={GitHub repository},
  url={https://github.com/OpenMOSS/MOSS-TTS-Nano}
}

@misc{gong2026mossttstechnicalreport,
  title={MOSS-TTS Technical Report},
  author={Yitian Gong and Botian Jiang and Yiwei Zhao and Yucheng Yuan and Kuangwei Chen and Yaozhou Jiang and Cheng Chang and Dong Hong and Mingshu Chen and Ruixiao Li and Yiyang Zhang and Yang Gao and Hanfu Chen and Ke Chen and Songlin Wang and Xiaogui Yang and Yuqian Zhang and Kexin Huang and ZhengYuan Lin and Kang Yu and Ziqi Chen and Jin Wang and Zhaoye Fei and Qinyuan Cheng and Shimin Li and Xipeng Qiu},
  year={2026},
  eprint={2603.18090},
  archivePrefix={arXiv},
  primaryClass={cs.SD},
  url={https://arxiv.org/abs/2603.18090}
}

@misc{gong2026mossaudiotokenizerscalingaudiotokenizers,
  title={MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models}, 
  author={Yitian Gong and Kuangwei Chen and Zhaoye Fei and Xiaogui Yang and Ke Chen and Yang Wang and Kexin Huang and Mingshu Chen and Ruixiao Li and Qingyuan Cheng and Shimin Li and Xipeng Qiu},
  year={2026},
  eprint={2602.10934},
  archivePrefix={arXiv},
  primaryClass={cs.SD},
  url={https://arxiv.org/abs/2602.10934}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
assets		assets
moss_tts_nano		moss_tts_nano
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
app.py		app.py
infer.py		infer.py
moss_tts_nano_runtime.py		moss_tts_nano_runtime.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
text_normalization_pipeline.py		text_normalization_pipeline.py
tts_robust_normalizer_single_script.py		tts_robust_normalizer_single_script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MOSS-TTS-Nano

News

Demo

Contents

Introduction

Main Features

Supported Languages

Quickstart

Environment Setup

Using Conda

Voice Clone with `infer.py`

Local Web Demo with `app.py`

CLI Command: `moss-tts-nano generate`

CLI Command: `moss-tts-nano serve`

MOSS-Audio-Tokenizer-Nano

Introduction

Model Weights

License

Citation

Star History

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MOSS-TTS-Nano

News

Demo

Contents

Introduction

Main Features

Supported Languages

Quickstart

Environment Setup

Using Conda

Voice Clone with infer.py

Local Web Demo with app.py

CLI Command: moss-tts-nano generate

CLI Command: moss-tts-nano serve

MOSS-Audio-Tokenizer-Nano

Introduction

Model Weights

License

Citation

Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Voice Clone with `infer.py`

Local Web Demo with `app.py`

CLI Command: `moss-tts-nano generate`

CLI Command: `moss-tts-nano serve`

Packages