FunASR Models
Choose the right model for your use case — from ultra-fast multilingual recognition to the highest Chinese accuracy.
Quick Comparison
| Model | Speed | Languages | Params | Best For |
|---|---|---|---|---|
| SenseVoice Small | 170x realtime | 50+ | 234M | Fast multilingual, emotion detection |
| Paraformer-zh | 13x realtime | zh, yue | 220M | Best Chinese accuracy |
| Fun-ASR-Nano | vLLM-accelerated | 31 | 800M | Timestamps, LLM-quality output |
| cam++ | realtime | any | 7.2M | Speaker diarization & verification |
ASR Models
SenseVoice Small
Ultra-fast speech recognition with built-in emotion and audio event detection. Supports 50+ languages including Chinese, English, Japanese, Korean, French, German, and more. Non-autoregressive architecture delivers 170x realtime speed on GPU.
170x realtime
50+ languages
Emotion detection
Audio events
CPU-viable
When to use
Best for: multilingual applications, real-time streaming, batch processing large audio collections, applications needing emotion or audio event detection.
from funasr import AutoModel model = AutoModel(model="iic/SenseVoiceSmall") result = model.generate(input="audio.wav") print(result[0]["text"])
Paraformer-zh Large
Highest-accuracy Chinese speech recognition model. Non-autoregressive with CTC-guided attention, trained on 60,000+ hours of Mandarin speech. Includes built-in punctuation restoration and timestamp prediction.
13x realtime
Chinese + Cantonese
Best accuracy
Timestamps
Punctuation
When to use
Best for: Chinese-only applications where accuracy is the top priority — meeting transcription, subtitle generation, voice input, training data annotation.
from funasr import AutoModel
model = AutoModel(
model="iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
vad_model="iic/speech_fsmn_vad_zh-cn-16k-common-pytorch",
punc_model="iic/punc_ct-transformer_cn-en-common-vocab471067-large",
)
result = model.generate(input="audio.wav")
print(result[0]["text"])
Fun-ASR-Nano
Next-generation LLM-based ASR model. Combines SenseVoice's audio encoder with Qwen3-0.6B language model for superior context understanding. Supports vLLM acceleration for high-throughput batch inference with word-level timestamps.
vLLM accelerated
31 languages
Timestamps
LLM-quality
When to use
Best for: applications requiring precise timestamps, high-throughput batch processing, scenarios where LLM-quality context understanding improves output (e.g., proper nouns, code-switching).
# With vLLM acceleration from funasr import AutoModel model = AutoModel(model="FunAudioLLM/Fun-ASR-Nano-2512", device="cuda", backend="vllm") result = model.generate(input="audio.wav") print(result[0]["text"])
Supporting Models
cam++ (Speaker Diarization)
Lightweight speaker embedding model for speaker diarization (who spoke when) and speaker verification (is this the same person). Only 7.2M parameters — runs on CPU in realtime.
7.2M params
Diarization
Verification
CPU realtime
FSMN-VAD
Feedforward Sequential Memory Network for Voice Activity Detection. Accurately detects speech segments in audio, handling silence, noise, and music. Used as a preprocessing step for all ASR models.
VAD
Lightweight
CT-Transformer (Punctuation)
Automatically adds punctuation to ASR output — commas, periods, question marks, etc. Supports Chinese and English. Dramatically improves readability of transcription output.
Punctuation
zh + en
OpenAI-Compatible API
All models are available through funasr-server, which exposes an OpenAI-compatible /v1/audio/transcriptions endpoint:
# Start the server pip install funasr vllm fastapi uvicorn python-multipart funasr-server --device cuda --port 8000 # Use with any OpenAI-compatible client curl http://localhost:8000/v1/audio/transcriptions \ -F file=@audio.wav \ -F model=SenseVoiceSmall
Drop-in replacement: Any application using OpenAI's Whisper API can switch to FunASR by changing the base URL. No code changes needed — same API format, same response structure.
Deployment Options
| Method | Command | Best For |
|---|---|---|
| pip | pip install funasr && funasr-server | Development, quick testing |
| Docker | docker run -d --gpus all -p 8000:8000 ... | Production deployment |
| Python API | from funasr import AutoModel | Embedding in applications |
| ONNX | Via Sherpa-ONNX | Mobile, edge, browser |