Run FunASR on CPU — the llama.cpp / GGUF Runtime

Single binary · No Python · No GPU · Quantized weights · whisper.cpp-style on-device ASR, strong on Chinese

FunASR on llama.cpp is to FunASR what whisper.cpp is to Whisper: it runs SenseVoice / Paraformer / Fun-ASR-Nano on the ggml stack, so the models work where there is no GPU and no Python (laptops, edge boxes, embedded C/C++ apps), complementing the PyTorch / vLLM paths for GPU serving. FSMN-VAD is built into the binaries.

Download prebuilt binaries (download & run)

Linux (x64/arm64), macOS (arm64), Windows (x64) — static, self-contained, zero dependencies. Available in all three repos' Releases:

⬇ modelscope/FunASR
all three models ⬇ FunAudioLLM/SenseVoice
multilingual + emotion/events ⬇ FunAudioLLM/Fun-ASR
Fun-ASR-Nano (LLM-ASR)

Three lines to run

# 1) Unpack the binaries, fetch a model (downloads GGUF + VAD)
bash download-funasr-model.sh sensevoice ./gguf

# 2) Get text directly (in-binary detok, no Python)
llama-funasr-sensevoice -m ./gguf/SenseVoiceSmall-f16.gguf --vad ./gguf/fsmn-vad.gguf -a audio.wav

Other models: download-funasr-model.sh paraformer with llama-funasr-paraformer; download-funasr-model.sh nano with llama-funasr-cli (Fun-ASR-Nano, 31 languages).

Accuracy: far ahead of whisper.cpp on Chinese

Same 184-clip Chinese test set, character error rate (CER, micro-avg, lower is better):

Model	CER ↓	Notes
FunASR SenseVoice	8.01%	multilingual + emotion/events
FunASR Paraformer	9.85%	non-autoregressive, industrial Chinese
FunASR Fun-ASR-Nano	8.30%	LLM-ASR, 31 languages
whisper.cpp small	22.12%
whisper.cpp large-v3-turbo	23.15%
whisper.cpp base	31.33%

FunASR's Chinese CER is roughly a third of whisper.cpp's. Full methodology in each repo's runtime/llama.cpp/BENCHMARKS.md.

What's included

6 binaries: llama-funasr-{cli,encoder,embd,sensevoice,paraformer,vad}, static, no .so dependencies.
Built-in FSMN-VAD (--vad), in-binary detokenization (prints text), kaldi-compatible fbank front end.
GGUF models on Hugging Face: FunAudioLLM / funasr (f16/f32 with embedded vocab).
Source & docs: runtime/llama.cpp/ (README / DESIGN / BENCHMARKS).

If it helps, a GitHub Star really supports the project 👇 Fully open-source, commercial-friendly.

⭐ Star FunASR

Also star: SenseVoice · Fun-ASR · FunClip

Further reading: FunASR on llama.cpp (a whisper.cpp alternative) — deep dive