Lightweight speech-to-text: Chinese ASR in ~250 MB on CPU (no GPU, no Python)
Most speech recognition wants a GPU, a Python environment, and multi-GB models. If you just need to turn Chinese speech into text on a laptop, a small server, or an edge device, that's a lot of overhead.
FunASR's llama.cpp / GGUF runtime strips it down to the minimum: **one self-contained binary + one quantized model file**, pure CPU, zero Python. The q8 build of SenseVoice is just **254 MB** with virtually unchanged accuracy.
Tested: a 254 MB model, 0.16 s on CPU (real output)
Grab a prebuilt binary (Linux/macOS/Windows) and a q8 model, then run:
# binaries are on Releases; then fetch a model bash download-funasr-model.sh sensevoice ./gguf llama-funasr-sensevoice -m ./gguf/sensevoice-small-q8.gguf --vad ./gguf/fsmn-vad.gguf -a audio.wav # → 欢迎大家来体验达摩院推出的语音识别模型。 (CPU, 0.16 s)
The model is 243–254 MB, the VAD just 1.7 MB, and detokenization is built into the binary (no Python).
Small and accurate: vs whisper.cpp
Both on CPU (Chinese, 184-file micro-CER, lower is better):
| Model | Size | Chinese CER ↓ |
|---|---|---|
| FunASR SenseVoice q8 | 254 MB | 7.99% |
| FunASR Paraformer q8 | 237 MB | 9.78% |
| whisper.cpp small | 466 MB | 22.12% |
| whisper.cpp large-v3-turbo | 1.6 GB | 23.15% |
FunASR's q8 model is smaller than whisper.cpp small yet ~3× more accurate on Chinese.
Even smaller: the quant matrix
| Model | Type | Size | CER |
|---|---|---|---|
| SenseVoice encoder | q8 | 254 MB | 7.99% |
| Paraformer encoder | q8 | 237 MB | 9.78% |
| Fun-ASR-Nano LLM (+ encoder 470 MB) | q4_K_M | 484 MB | 8.35% |
Prebuilt binaries for Linux x64 / arm64, macOS arm64, Windows x64 — the arm64 build suits Raspberry Pi and edge boxes.
Where to get it
- Prebuilt binaries: GitHub Releases (tag
runtime-llamacpp-v*) - One-page quickstart: funasr.com/llama-cpp
- GGUF models: Hugging Face / ModelScope
FunASR is fully open-source & commercial-friendly. A GitHub Star really helps 👇
Also star: SenseVoice · Fun-ASR · FunClip
Related posts
- FunASR vs Whisper Benchmark
- SenseVoice Deployment Guide
- Fun-ASR-Nano Guide
- Speaker Diarization: Who Spoke When
- Emotion & Language Detection
- Real-Time Streaming Speech-to-Text
- Transcribe Long Audio (Hours in One Call)
- Transcribe from the Command Line
- Self-Hosted OpenAI Whisper API Alternative
- Auto-Generate Subtitles (SRT / VTT)
- Speech to Text in Python
- FunASR on llama.cpp (whisper.cpp Alternative)
- FunASR vs faster-whisper (Chinese/Cantonese)
- Cantonese Speech Recognition (SenseVoice)