Skip to content

modelscope/FunASR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5,020 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

(็ฎ€ไฝ“ไธญๆ–‡|English|ๆ—ฅๆœฌ่ชž|แ„’แ…กแ†ซแ„€แ…ฎแ†จแ„‹แ…ฅ)

FunASR

Industrial speech recognition. 170x faster than Whisper. 50+ languages.
Speaker diarization ยท Emotion detection ยท Streaming ยท One API call

PyPI Stars Downloads Docs

modelscope%2FFunASR | Trendshift

Quick Start ยท Colab ยท Benchmark ยท Model selection ยท Migration guide ยท Use cases ยท Deployment matrix ยท Models ยท Agent Integration ยท Docs ยท Contribute


Quick Start

Open In Colab

No local setup? Open the Colab quickstart to transcribe a public sample or upload your own audio in a browser.

pip install funasr
from funasr import AutoModel

model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad", spk_model="cam++", device="cuda")
result = model.generate(input="meeting.wav")

Output โ€” structured text with speaker labels, timestamps, and punctuation:

[00:00.4 โ†’ 00:03.8] Speaker 0: Let's discuss the Q3 plan.
[00:04.2 โ†’ 00:07.1] Speaker 1: Sounds good. I have three points.
[00:07.5 โ†’ 00:12.3] Speaker 0: Go ahead. We have 30 minutes.

That's it. One model, one call โ€” VAD segmentation, speech recognition, punctuation, speaker diarization all happen automatically.

Deploy as API server: funasr-server --device cuda โ†’ OpenAI-compatible endpoint at localhost:8000

Use with AI agents: MCP Server for Claude/Cursor ยท OpenAI API for LangChain/Dify/AutoGen

Why FunASR?

FunASR Whisper Cloud APIs
Speed 170x realtime 13x realtime ~1x realtime
Speaker ID โœ… Built-in โŒ Needs pyannote โœ… Extra cost
Emotion โœ… Happy/Sad/Angry โŒ โŒ
Languages 50+ 57 Varies
Streaming โœ… WebSocket โŒ โœ…
vLLM Acceleration โœ… 2-3x faster โŒ N/A
Self-hosted โœ… MIT license โœ… MIT license โŒ Cloud only
Cost Free Free $0.006/min+
CPU viable โœ… 17x realtime โŒ Too slow N/A

Trying FunASR for the first time? Use the Colab quickstart before setting up a local environment. Choosing a first model? Start with the model selection guide. Planning a switch from Whisper or a cloud ASR provider? Use the migration guide and benchmark example to test representative audio, map features, and roll out safely.


Benchmark

184 long-form audio files (192 min). Full report โ†’

Model GPU Speed CPU Speed vs Whisper-large-v3
SenseVoice-Small 170x realtime 17x realtime ๐Ÿš€ 13x faster
Paraformer-Large 120x realtime 15x realtime ๐Ÿš€ 9x faster
Whisper-large-v3-turbo 46x realtime โŒ 3.4x faster
Fun-ASR-Nano 17x realtime 3.6x realtime 1.3x faster
Whisper-large-v3 13x realtime โŒ baseline

Key takeaway: FunASR models run on CPU faster than Whisper runs on GPU.


What's new

  • 2026/05/24: vLLM Inference Engine โ€” 2-3x faster LLM decoding for Fun-ASR-Nano. Streaming WebSocket service with VAD + Speaker Diarization. Guide โ†’
  • 2026/05/24: Dynamic VAD โ€” adaptive silence threshold (default on). Short sentences stay intact, long segments get auto-split. Details โ†’
  • 2026/05/24: v1.3.3 โ€” funasr-server CLI, OpenAI-compatible API, MCP Server for AI agents. pip install --upgrade funasr
  • 2026/05/20: Added Qwen3-ASR (0.6B/1.7B) โ€” 52 languages, auto detection. usage
  • 2026/05/20: Added GLM-ASR-Nano (1.5B) โ€” 17 languages, dialect support. usage
  • 2026/05/19: Fun-ASR-Nano and SenseVoice now support speaker diarization.
  • 2025/12/15: Fun-ASR-Nano-2512 โ€” 31 languages, tens of millions of hours training.
Older
  • 2024/10/10: Whisper-large-v3-turbo support added.
  • 2024/07/04: SenseVoice โ€” ASR + emotion + audio events.
  • 2024/01/30: FunASR 1.0 released.

Installation

pip install funasr
From source / Requirements
git clone https://github.com/modelscope/FunASR.git && cd FunASR
pip install -e ./

Requirements: Python โ‰ฅ 3.8, PyTorch โ‰ฅ 1.13, torchaudio


Model Zoo

Model Task Languages Params Links
Fun-ASR-Nano ASR + timestamps 31 languages 800M โญ ๐Ÿค—
SenseVoiceSmall ASR + emotion + events zh/en/ja/ko/yue 234M โญ ๐Ÿค—
Paraformer-zh ASR + timestamps zh/en 220M โญ ๐Ÿค—
Paraformer-zh-streaming Streaming ASR zh/en 220M โญ ๐Ÿค—
Qwen3-ASR ASR, 52 languages multilingual 1.7B usage
GLM-ASR-Nano ASR, 17 languages multilingual 1.5B usage
Whisper-large-v3 ASR + translation multilingual 1550M usage
Whisper-large-v3-turbo ASR + translation multilingual 809M usage
ct-punc Punctuation zh/en 290M โญ ๐Ÿค—
fsmn-vad VAD zh/en 0.4M โญ ๐Ÿค—
cam++ Speaker diarization โ€” 7.2M โญ ๐Ÿค—
emotion2vec+large Emotion recognition โ€” 300M โญ ๐Ÿค—

Usage

Full examples with parameter docs: Tutorial โ†’

from funasr import AutoModel

# Chinese production (VAD + ASR + punctuation + speaker)
model = AutoModel(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc", spk_model="cam++", device="cuda")
result = model.generate(input="meeting.wav", hotword="ๅ…ณ้”ฎ่ฏ 20")

# 31 languages with timestamps
model = AutoModel(model="FunAudioLLM/Fun-ASR-Nano-2512", hub="hf", trust_remote_code=True,
                  vad_model="fsmn-vad", vad_kwargs={"max_single_segment_time": 30000}, device="cuda")
result = model.generate(input="audio.wav", batch_size=1)

# Streaming real-time
model = AutoModel(model="paraformer-zh-streaming", device="cuda")
result = model.generate(input="chunk.wav", cache={}, chunk_size=[0, 10, 5])

# Emotion recognition
model = AutoModel(model="emotion2vec_plus_large", device="cuda")
result = model.generate(input="audio.wav", granularity="utterance")

Deploy

# OpenAI-compatible API (recommended)
pip install funasr fastapi uvicorn python-multipart
funasr-server --model sensevoice --device cuda
# โ†’ POST /v1/audio/transcriptions at localhost:8000

Verify it with a public sample:

curl -L https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav -o sample.wav
curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@sample.wav \
  -F model=sensevoice \
  -F response_format=verbose_json
# Docker streaming service
docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.12

OpenAI API example โ†’ ยท Gradio demo โ†’ ยท Client recipes โ†’ ยท JavaScript/TypeScript recipes โ†’ ยท Kubernetes template โ†’ ยท Workflow recipes โ†’ ยท Postman collection โ†’ ยท OpenAPI spec โ†’ ยท Security guide โ†’ ยท Deployment matrix โ†’ ยท Deployment docs โ†’ ยท Agent integration โ†’


Community

๐Ÿ“– Documentation ๐Ÿ› Issues
๐Ÿ’ฌ Discussions ๐Ÿค— HuggingFace
๐Ÿค Contributing ๐Ÿ“ˆ 20k growth plan

Star History

Star History Chart

License

MIT License

Citations

@inproceedings{gao2023funasr,
  author={Zhifu Gao and others},
  title={FunASR: A Fundamental End-to-End Speech Recognition Toolkit},
  booktitle={INTERSPEECH},
  year={2023}
}