(็ฎไฝไธญๆ|English|ๆฅๆฌ่ช|แแ กแซแแ ฎแจแแ ฅ)
Industrial speech recognition. 170x faster than Whisper. 50+ languages.
Speaker diarization ยท Emotion detection ยท Streaming ยท One API call
Quick Start ยท Colab ยท Benchmark ยท Model selection ยท Migration guide ยท Use cases ยท Deployment matrix ยท Models ยท Agent Integration ยท Docs ยท Contribute
No local setup? Open the Colab quickstart to transcribe a public sample or upload your own audio in a browser.
pip install funasrfrom funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad", spk_model="cam++", device="cuda")
result = model.generate(input="meeting.wav")Output โ structured text with speaker labels, timestamps, and punctuation:
[00:00.4 โ 00:03.8] Speaker 0: Let's discuss the Q3 plan.
[00:04.2 โ 00:07.1] Speaker 1: Sounds good. I have three points.
[00:07.5 โ 00:12.3] Speaker 0: Go ahead. We have 30 minutes.
That's it. One model, one call โ VAD segmentation, speech recognition, punctuation, speaker diarization all happen automatically.
Deploy as API server:
funasr-server --device cudaโ OpenAI-compatible endpoint at localhost:8000Use with AI agents: MCP Server for Claude/Cursor ยท OpenAI API for LangChain/Dify/AutoGen
| FunASR | Whisper | Cloud APIs | |
|---|---|---|---|
| Speed | 170x realtime | 13x realtime | ~1x realtime |
| Speaker ID | โ Built-in | โ Needs pyannote | โ Extra cost |
| Emotion | โ Happy/Sad/Angry | โ | โ |
| Languages | 50+ | 57 | Varies |
| Streaming | โ WebSocket | โ | โ |
| vLLM Acceleration | โ 2-3x faster | โ | N/A |
| Self-hosted | โ MIT license | โ MIT license | โ Cloud only |
| Cost | Free | Free | $0.006/min+ |
| CPU viable | โ 17x realtime | โ Too slow | N/A |
Trying FunASR for the first time? Use the Colab quickstart before setting up a local environment. Choosing a first model? Start with the model selection guide. Planning a switch from Whisper or a cloud ASR provider? Use the migration guide and benchmark example to test representative audio, map features, and roll out safely.
184 long-form audio files (192 min). Full report โ
| Model | GPU Speed | CPU Speed | vs Whisper-large-v3 |
|---|---|---|---|
| SenseVoice-Small | 170x realtime | 17x realtime | ๐ 13x faster |
| Paraformer-Large | 120x realtime | 15x realtime | ๐ 9x faster |
| Whisper-large-v3-turbo | 46x realtime | โ | 3.4x faster |
| Fun-ASR-Nano | 17x realtime | 3.6x realtime | 1.3x faster |
| Whisper-large-v3 | 13x realtime | โ | baseline |
Key takeaway: FunASR models run on CPU faster than Whisper runs on GPU.
- 2026/05/24: vLLM Inference Engine โ 2-3x faster LLM decoding for Fun-ASR-Nano. Streaming WebSocket service with VAD + Speaker Diarization. Guide โ
- 2026/05/24: Dynamic VAD โ adaptive silence threshold (default on). Short sentences stay intact, long segments get auto-split. Details โ
- 2026/05/24: v1.3.3 โ
funasr-serverCLI, OpenAI-compatible API, MCP Server for AI agents.pip install --upgrade funasr - 2026/05/20: Added Qwen3-ASR (0.6B/1.7B) โ 52 languages, auto detection. usage
- 2026/05/20: Added GLM-ASR-Nano (1.5B) โ 17 languages, dialect support. usage
- 2026/05/19: Fun-ASR-Nano and SenseVoice now support speaker diarization.
- 2025/12/15: Fun-ASR-Nano-2512 โ 31 languages, tens of millions of hours training.
Older
- 2024/10/10: Whisper-large-v3-turbo support added.
- 2024/07/04: SenseVoice โ ASR + emotion + audio events.
- 2024/01/30: FunASR 1.0 released.
pip install funasrFrom source / Requirements
git clone https://github.com/modelscope/FunASR.git && cd FunASR
pip install -e ./Requirements: Python โฅ 3.8, PyTorch โฅ 1.13, torchaudio
| Model | Task | Languages | Params | Links |
|---|---|---|---|---|
| Fun-ASR-Nano | ASR + timestamps | 31 languages | 800M | โญ ๐ค |
| SenseVoiceSmall | ASR + emotion + events | zh/en/ja/ko/yue | 234M | โญ ๐ค |
| Paraformer-zh | ASR + timestamps | zh/en | 220M | โญ ๐ค |
| Paraformer-zh-streaming | Streaming ASR | zh/en | 220M | โญ ๐ค |
| Qwen3-ASR | ASR, 52 languages | multilingual | 1.7B | usage |
| GLM-ASR-Nano | ASR, 17 languages | multilingual | 1.5B | usage |
| Whisper-large-v3 | ASR + translation | multilingual | 1550M | usage |
| Whisper-large-v3-turbo | ASR + translation | multilingual | 809M | usage |
| ct-punc | Punctuation | zh/en | 290M | โญ ๐ค |
| fsmn-vad | VAD | zh/en | 0.4M | โญ ๐ค |
| cam++ | Speaker diarization | โ | 7.2M | โญ ๐ค |
| emotion2vec+large | Emotion recognition | โ | 300M | โญ ๐ค |
Full examples with parameter docs: Tutorial โ
from funasr import AutoModel
# Chinese production (VAD + ASR + punctuation + speaker)
model = AutoModel(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc", spk_model="cam++", device="cuda")
result = model.generate(input="meeting.wav", hotword="ๅ
ณ้ฎ่ฏ 20")
# 31 languages with timestamps
model = AutoModel(model="FunAudioLLM/Fun-ASR-Nano-2512", hub="hf", trust_remote_code=True,
vad_model="fsmn-vad", vad_kwargs={"max_single_segment_time": 30000}, device="cuda")
result = model.generate(input="audio.wav", batch_size=1)
# Streaming real-time
model = AutoModel(model="paraformer-zh-streaming", device="cuda")
result = model.generate(input="chunk.wav", cache={}, chunk_size=[0, 10, 5])
# Emotion recognition
model = AutoModel(model="emotion2vec_plus_large", device="cuda")
result = model.generate(input="audio.wav", granularity="utterance")# OpenAI-compatible API (recommended)
pip install funasr fastapi uvicorn python-multipart
funasr-server --model sensevoice --device cuda
# โ POST /v1/audio/transcriptions at localhost:8000Verify it with a public sample:
curl -L https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav -o sample.wav
curl http://localhost:8000/v1/audio/transcriptions \
-F file=@sample.wav \
-F model=sensevoice \
-F response_format=verbose_json# Docker streaming service
docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.12OpenAI API example โ ยท Gradio demo โ ยท Client recipes โ ยท JavaScript/TypeScript recipes โ ยท Kubernetes template โ ยท Workflow recipes โ ยท Postman collection โ ยท OpenAPI spec โ ยท Security guide โ ยท Deployment matrix โ ยท Deployment docs โ ยท Agent integration โ
| ๐ Documentation | ๐ Issues |
| ๐ฌ Discussions | ๐ค HuggingFace |
| ๐ค Contributing | ๐ 20k growth plan |
@inproceedings{gao2023funasr,
author={Zhifu Gao and others},
title={FunASR: A Fundamental End-to-End Speech Recognition Toolkit},
booktitle={INTERSPEECH},
year={2023}
}