FunASR Models

Choose the right model for your use case — from ultra-fast multilingual recognition to the highest Chinese accuracy.

Quick Comparison

ModelSpeedLanguagesParamsBest For
SenseVoice Small170x realtime50+234MFast multilingual, emotion detection
Paraformer-zh13x realtimezh, yue220MBest Chinese accuracy
Fun-ASR-NanovLLM-accelerated31800MTimestamps, LLM-quality output
cam++realtimeany7.2MSpeaker diarization & verification

ASR Models

SenseVoice Small
234M params · Non-autoregressive · GitHub (8.3K stars) · HuggingFace
Ultra-fast speech recognition with built-in emotion and audio event detection. Supports 50+ languages including Chinese, English, Japanese, Korean, French, German, and more. Non-autoregressive architecture delivers 170x realtime speed on GPU.
170x realtime 50+ languages Emotion detection Audio events CPU-viable

When to use

Best for: multilingual applications, real-time streaming, batch processing large audio collections, applications needing emotion or audio event detection.

from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall")
result = model.generate(input="audio.wav")
print(result[0]["text"])
Paraformer-zh Large
220M params · Non-autoregressive · HuggingFace
Highest-accuracy Chinese speech recognition model. Non-autoregressive with CTC-guided attention, trained on 60,000+ hours of Mandarin speech. Includes built-in punctuation restoration and timestamp prediction.
13x realtime Chinese + Cantonese Best accuracy Timestamps Punctuation

When to use

Best for: Chinese-only applications where accuracy is the top priority — meeting transcription, subtitle generation, voice input, training data annotation.

from funasr import AutoModel
model = AutoModel(
    model="iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
    vad_model="iic/speech_fsmn_vad_zh-cn-16k-common-pytorch",
    punc_model="iic/punc_ct-transformer_cn-en-common-vocab471067-large",
)
result = model.generate(input="audio.wav")
print(result[0]["text"])
Fun-ASR-Nano
800M params · LLM-based (SenseVoice encoder + Qwen3-0.6B) · GitHub (1.2K stars) · HuggingFace
Next-generation LLM-based ASR model. Combines SenseVoice's audio encoder with Qwen3-0.6B language model for superior context understanding. Supports vLLM acceleration for high-throughput batch inference with word-level timestamps.
vLLM accelerated 31 languages Timestamps LLM-quality

When to use

Best for: applications requiring precise timestamps, high-throughput batch processing, scenarios where LLM-quality context understanding improves output (e.g., proper nouns, code-switching).

# With vLLM acceleration
from funasr import AutoModel
model = AutoModel(model="FunAudioLLM/Fun-ASR-Nano-2512", device="cuda", backend="vllm")
result = model.generate(input="audio.wav")
print(result[0]["text"])

Supporting Models

cam++ (Speaker Diarization)
7.2M params · HuggingFace
Lightweight speaker embedding model for speaker diarization (who spoke when) and speaker verification (is this the same person). Only 7.2M parameters — runs on CPU in realtime.
7.2M params Diarization Verification CPU realtime
FSMN-VAD
Built-in · Voice Activity Detection
Feedforward Sequential Memory Network for Voice Activity Detection. Accurately detects speech segments in audio, handling silence, noise, and music. Used as a preprocessing step for all ASR models.
VAD Lightweight
CT-Transformer (Punctuation)
Built-in · Punctuation Restoration
Automatically adds punctuation to ASR output — commas, periods, question marks, etc. Supports Chinese and English. Dramatically improves readability of transcription output.
Punctuation zh + en

OpenAI-Compatible API

All models are available through funasr-server, which exposes an OpenAI-compatible /v1/audio/transcriptions endpoint:

# Start the server
pip install funasr vllm fastapi uvicorn python-multipart
funasr-server --device cuda --port 8000

# Use with any OpenAI-compatible client
curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model=SenseVoiceSmall
Drop-in replacement: Any application using OpenAI's Whisper API can switch to FunASR by changing the base URL. No code changes needed — same API format, same response structure.

Deployment Options

MethodCommandBest For
pippip install funasr && funasr-serverDevelopment, quick testing
Dockerdocker run -d --gpus all -p 8000:8000 ...Production deployment
Python APIfrom funasr import AutoModelEmbedding in applications
ONNXVia Sherpa-ONNXMobile, edge, browser