Self-Hosted Alternative to Deepgram & AssemblyAI: Open-Source, Free, Data Stays Local

Cloud speech-to-text APIs like Deepgram, AssemblyAI and Google/Azure Speech bill per minute and require uploading your audio to a third party. If you want to self-host, pay no per-minute fees, and keep audio on your own machines — especially for Chinese — FunASR is a direct open-source alternative with a built-in OpenAI-compatible transcription API: clients just change the base URL.

Comparison

FunASR (self-hosted)Deepgram / AssemblyAI
PriceFree (MIT), no per-minute billingPay per minute
DeploymentSelf-hosted (local / private cloud / edge)Cloud API (managed)
Data privacyAudio never leaves your machineUploaded to vendor
Chinese / CantoneseIndustry-leadingEnglish-first
InterfaceOpenAI-compatible (drop-in)Proprietary SDKs
Offline / CPU✅ (incl. llama.cpp single binary)❌ cloud only

Cloud APIs win on zero-ops, elastic scaling, and mature English/value-added features; FunASR wins on cost, self-hosting, privacy, and Chinese. Choose by your needs.

Run an OpenAI-compatible local STT server

pip install funasr
funasr-server --model sensevoice --device cuda      # or --device cpu
# → POST http://localhost:8000/v1/audio/transcriptions

Then any OpenAI client just changes base_url — no business-code changes:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
text = client.audio.transcriptions.create(model="sensevoice", file=open("audio.wav","rb")).text

Migrating from Deepgram/AssemblyAI or the OpenAI Whisper API: point requests at your own funasr-server — usage bills go to zero and audio stays local.

Which model?

SenseVoice for multilingual + emotion; Paraformer for Chinese + timestamps/hotwords; Fun-ASR-Nano for top accuracy + 31 languages. See the model selection guide.

FunASR is open-source, commercial-friendly, and self-hostable. A Star really helps 👇

⭐ Star FunASR

Also: SenseVoice · Fun-ASR · FunClip

Related posts