Self-Hosted Alternative to Deepgram & AssemblyAI: Open-Source, Free, Data Stays Local
Cloud speech-to-text APIs like Deepgram, AssemblyAI and Google/Azure Speech bill per minute and require uploading your audio to a third party. If you want to self-host, pay no per-minute fees, and keep audio on your own machines — especially for Chinese — FunASR is a direct open-source alternative with a built-in OpenAI-compatible transcription API: clients just change the base URL.
Comparison
| FunASR (self-hosted) | Deepgram / AssemblyAI | |
|---|---|---|
| Price | Free (MIT), no per-minute billing | Pay per minute |
| Deployment | Self-hosted (local / private cloud / edge) | Cloud API (managed) |
| Data privacy | Audio never leaves your machine | Uploaded to vendor |
| Chinese / Cantonese | Industry-leading | English-first |
| Interface | OpenAI-compatible (drop-in) | Proprietary SDKs |
| Offline / CPU | ✅ (incl. llama.cpp single binary) | ❌ cloud only |
Cloud APIs win on zero-ops, elastic scaling, and mature English/value-added features; FunASR wins on cost, self-hosting, privacy, and Chinese. Choose by your needs.
Run an OpenAI-compatible local STT server
pip install funasr funasr-server --model sensevoice --device cuda # or --device cpu # → POST http://localhost:8000/v1/audio/transcriptions
Then any OpenAI client just changes base_url — no business-code changes:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
text = client.audio.transcriptions.create(model="sensevoice", file=open("audio.wav","rb")).text
Migrating from Deepgram/AssemblyAI or the OpenAI Whisper API: point requests at your own funasr-server — usage bills go to zero and audio stays local.
Which model?
SenseVoice for multilingual + emotion; Paraformer for Chinese + timestamps/hotwords; Fun-ASR-Nano for top accuracy + 31 languages. See the model selection guide.
FunASR is open-source, commercial-friendly, and self-hostable. A Star really helps 👇
Also: SenseVoice · Fun-ASR · FunClip
Related posts
- FunASR vs Whisper Benchmark
- SenseVoice Deployment Guide
- Fun-ASR-Nano Guide
- Speaker Diarization: Who Spoke When
- Emotion & Language Detection
- Real-Time Streaming Speech-to-Text
- Transcribe Long Audio (Hours in One Call)
- Transcribe from the Command Line
- Self-Hosted OpenAI Whisper API Alternative
- Auto-Generate Subtitles (SRT / VTT)
- Speech to Text in Python
- FunASR on llama.cpp (whisper.cpp Alternative)
- FunASR vs faster-whisper (Chinese/Cantonese)
- Lightweight Speech Recognition on CPU
- Which FunASR Model?
- Cantonese Speech Recognition (SenseVoice)