Run FunASR on CPU — the llama.cpp / GGUF Runtime

Single binary · No Python · No GPU · Quantized weights · whisper.cpp-style on-device ASR, strong on Chinese

FunASR on llama.cpp is to FunASR what whisper.cpp is to Whisper: it runs SenseVoice / Paraformer / Fun-ASR-Nano on the ggml stack, so the models work where there is no GPU and no Python (laptops, edge boxes, embedded C/C++ apps), complementing the PyTorch / vLLM paths for GPU serving. FSMN-VAD is built into the binaries.

Download prebuilt binaries (download & run)

Linux (x64/arm64), macOS (arm64), Windows (x64) — static, self-contained, zero dependencies. Available in all three repos' Releases:

Three lines to run

# 1) Unpack the binaries, fetch a model (downloads GGUF + VAD)
bash download-funasr-model.sh sensevoice ./gguf

# 2) Get text directly (in-binary detok, no Python)
llama-funasr-sensevoice -m ./gguf/SenseVoiceSmall-f16.gguf --vad ./gguf/fsmn-vad.gguf -a audio.wav

Other models: download-funasr-model.sh paraformer with llama-funasr-paraformer; download-funasr-model.sh nano with llama-funasr-cli (Fun-ASR-Nano, 31 languages).

Accuracy: far ahead of whisper.cpp on Chinese

Same 184-clip Chinese test set, character error rate (CER, micro-avg, lower is better):

ModelCER ↓Notes
FunASR SenseVoice8.01%multilingual + emotion/events
FunASR Paraformer9.85%non-autoregressive, industrial Chinese
FunASR Fun-ASR-Nano8.30%LLM-ASR, 31 languages
whisper.cpp small22.12%
whisper.cpp large-v3-turbo23.15%
whisper.cpp base31.33%

FunASR's Chinese CER is roughly a third of whisper.cpp's. Full methodology in each repo's runtime/llama.cpp/BENCHMARKS.md.

What's included

If it helps, a GitHub Star really supports the project 👇 Fully open-source, commercial-friendly.

⭐ Star FunASR

Also star: SenseVoice · Fun-ASR · FunClip

Further reading: FunASR on llama.cpp (a whisper.cpp alternative) — deep dive