Benchmark: FunASR vs Whisper — Real-World Performance on Chinese Meeting Audio #2947

LauraGPT · 2026-05-27T15:35:28Z

LauraGPT
May 27, 2026
Maintainer

TL;DR

Model	Speed (GPU)	CER (Chinese)	Speaker ID	Streaming	CPU-viable
FunASR SenseVoice	170x realtime	4.2%	✅ Built-in	❌	✅ 17x
FunASR Fun-ASR-Nano	340x (vLLM)	8.2%	✅ Built-in	✅ WebSocket	✅ 3.6x
Whisper-large-v3	13x realtime	9.8%	❌ Needs pyannote	❌	❌
Whisper-large-v3-turbo	46x realtime	10.1%	❌	❌	❌

Test Setup

Audio: 184 files, 192 minutes total, Chinese meeting recordings
Hardware: NVIDIA A100 80GB
Metrics: Real-Time Factor (RTFx), Character Error Rate (CER)

Key Findings

1. Speed: FunASR is 13x faster than Whisper on GPU

SenseVoice-Small processes 10 seconds of audio in just 70ms. On CPU, FunASR models run at 17x realtime — faster than Whisper runs on GPU.

2. Chinese Accuracy: FunASR wins on dialects and accents

FunASR supports 7 Chinese dialects (Wu, Cantonese, Min, Hakka, Gan, Xiang, Jin) and 26 regional accents. Whisper struggles with non-Mandarin Chinese.

3. All-in-one Pipeline: No assembly required

With Whisper, you need to combine 5 separate projects for a production pipeline:

Whisper (ASR)
pyannote (speaker diarization)
silero-vad (voice activity detection)
deepmultilingualpunctuation (punctuation)
Custom code (emotion detection — not available)

FunASR does all this in one API call:

from funasr import AutoModel
model = AutoModel(model=\"paraformer-zh\", vad_model=\"fsmn-vad\", punc_model=\"ct-punc\", spk_model=\"cam++\")
result = model.generate(input=\"meeting.wav\")
# Returns: text + timestamps + speaker IDs + punctuation

4. OpenAI-Compatible API: Zero-code migration

If you are already using OpenAI Whisper API or any compatible client, switching to FunASR requires zero code changes:

pip install funasr vllm fastapi uvicorn python-multipart
funasr-server --device cuda
# Now POST to http://localhost:8000/v1/audio/transcriptions

5. Self-hosted & Free

MIT license, no API keys needed
All data stays on your machine
No per-minute charges

How to Try

pip install torch torchaudio
pip install funasr

from funasr import AutoModel
model = AutoModel(model=\"iic/SenseVoiceSmall\", device=\"cuda\")
result = model.generate(input=\"your_audio.wav\")
print(result[0][\"text\"])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark: FunASR vs Whisper — Real-World Performance on Chinese Meeting Audio #2947

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Benchmark: FunASR vs Whisper — Real-World Performance on Chinese Meeting Audio #2947

Uh oh!

LauraGPT May 27, 2026 Maintainer

TL;DR

Test Setup

Key Findings

1. Speed: FunASR is 13x faster than Whisper on GPU

2. Chinese Accuracy: FunASR wins on dialects and accents

3. All-in-one Pipeline: No assembly required

4. OpenAI-Compatible API: Zero-code migration

5. Self-hosted & Free

How to Try

Links

Replies: 0 comments

LauraGPT
May 27, 2026
Maintainer