Self-Hosted Speech-to-Text — A Free, Open-Source Alternative to Google / AWS / Azure Cloud Speech APIs

2026-06-22 · FunASR Team

Cloud speech APIs like Google Cloud Speech-to-Text, AWS Transcribe, and Azure Speech are convenient — fully managed, auto-scaling, zero ops. But as volume grows or compliance matters, their downsides show: per-minute billing (expensive at scale), audio is uploaded to the vendor's cloud (privacy / compliance / data-residency), rate limits, internet dependency, and limited customization.

FunASR (open-sourced by Tongyi Lab) is a mature self-hosted alternative: open-source and free (MIT, commercial-friendly), runs on your own machine (fully offline-capable), no per-minute billing, and your audio never leaves your network. It's especially strong on Chinese and Asian languages, and ships an OpenAI-compatible API so migration is nearly free. Here's runnable code.

Transcribe locally in 4 lines (real output)

pip install funasr

from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model = AutoModel(model="iic/SenseVoiceSmall", disable_update=True)  # uses CPU if no GPU
res = model.generate(input="audio.wav", language="auto", use_itn=True)

print(rich_transcription_postprocess(res[0]["text"]))
# 欢迎大家来体验达摩院推出的语音识别模型。  (output for a Chinese sample)

The model downloads on first run, then everything is local inference — no API key, no per-minute bill, no audio uploaded anywhere.

Already on a cloud API? Migrate by changing the base_url

FunASR ships an OpenAI-compatible transcription server, so if your app already calls a cloud SDK you often only point base_url at your own service:

# start a local server (OpenAI-compatible /v1/audio/transcriptions)
funasr-server --model sensevoice --device cuda

# client: the OpenAI SDK, just change base_url to your server
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
text = client.audio.transcriptions.create(model="sensevoice", file=open("audio.wav","rb")).text

FunASR vs cloud speech APIs

	FunASR (self-hosted)	Cloud STT (Google/AWS/Azure)
Pricing	open-source & free; fixed machine cost only	per-minute, scales linearly with usage
Data	stays on your servers (offline-capable)	audio uploaded to the vendor cloud
Deployment	you run it (pip / Docker)	fully managed, zero ops
Languages	50+, very strong Chinese/Asian	many (varies by vendor)
Speaker diarization	✅ built-in (cam++)	✅ (extra cost/config)
Emotion / audio events	✅ (SenseVoice)	mostly ❌
Customization / fine-tune	✅ full model access	limited
Offline / air-gapped	✅	❌
License	open-source, commercial-friendly	proprietary

To be fair: if what you want is zero ops, elastic scale, and no machines to manage, a managed cloud API is still the easy choice. FunASR's trade-off is that you run one machine of your own in exchange for controllable cost, private data, offline capability, and full customization.

When self-hosting pays off

Cloud STT typically bills per audio minute (roughly $0.6–$1.5 per audio hour depending on vendor/tier). So 1,000 hours/month is about $600–$1,500/month and grows linearly, whereas a self-hosted FunASR instance is a fixed machine cost, largely independent of volume. Self-hosting usually wins when any of these hold:

High volume: lots of audio per month, where per-minute bills dwarf one machine's cost;
Sensitive data: healthcare/finance/government, where audio can't leave your network;
Offline / air-gapped deployment environments;
Chinese / dialects / Asian languages as the primary need, where you want stronger local accuracy and customization.

The whole FunASR stack is open-source (MIT) — industrial-grade ASR / VAD / punctuation / speaker / emotion & events / LLM-ASR, self-hosted and commercial-friendly. If it helps, a GitHub Star really supports the project 👇

⭐ Star FunASR

Also star:SenseVoice · Fun-ASR · FunClip

Self-Hosted Speech-to-Text — A Free, Open-Source Alternative to Google / AWS / Azure Cloud Speech APIs

Transcribe locally in 4 lines (real output)

Already on a cloud API? Migrate by changing the base_url

FunASR vs cloud speech APIs

When self-hosting pays off

Related posts