Self-Hosted OpenAI Whisper API Alternative: Drop-in /v1/audio/transcriptions with FunASR
OpenAI's /v1/audio/transcriptions (the Whisper API) charges per minute, uploads your audio to the cloud, and is rate-limited. If you just need audio turned into text, you can run a server with an identical interface on your own machine — existing code switches over by changing a single base_url. FunASR ships exactly that as funasr-server. Every snippet below is tested.
Start an OpenAI-compatible server in one line
pip install -U funasr
funasr-server --model sensevoice --device cuda # listens on localhost:8000
You now have the same two endpoints OpenAI exposes: POST /v1/audio/transcriptions and GET /v1/models.
Call it with the official OpenAI SDK (just change base_url)
Your existing OpenAI code is almost unchanged — point base_url at the local server and put any non-empty api_key:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
with open("audio.wav", "rb") as f:
r = client.audio.transcriptions.create(model="sensevoice", file=f)
print(r.text)
# -> 欢迎大家来体验达摩院推出的语音识别模型 (the Chinese sample above)
Listing models works the same way:
print([m.id for m in client.models.list().data])
# -> ['fun-asr-nano', 'sensevoice', 'paraformer']
Or use curl / any HTTP client
curl http://localhost:8000/v1/audio/transcriptions \
-F "file=@audio.wav" \
-F "model=sensevoice"
# -> {"text": "..."}
Because the API is OpenAI-compatible, the Node openai package, LangChain, and most existing SDKs connect to it unchanged.
OpenAI Whisper API vs self-hosted FunASR
| OpenAI Whisper API | FunASR self-hosted | |
|---|---|---|
| Cost | $0.006 / min | Free (your hardware) |
| Privacy | Audio leaves your machine | Stays local |
| Rate limits | Yes | None |
| Chinese accuracy | Mediocre | Higher (benchmark) |
| Speed | — | SenseVoice is non-autoregressive, far faster than Whisper |
| Interface | /v1/audio/transcriptions | Identical |
Which model to pick
sensevoice— default; non-autoregressive, very fast, 50+ languages, with emotion/events. Best everyday choice.fun-asr-nano— LLM decoder, highest accuracy, 31 languages incl. Chinese dialects (needs vLLM).paraformer— classic, production-grade Chinese.
Just set model= in the request — no server restart needed.
What changes when migrating from OpenAI
- base_url: point it at your
funasr-server. - api_key: not validated locally — any non-empty string works.
- Everything else: the
client.audio.transcriptions.create(...)call is the same.
FunASR is Tongyi Lab's open-source, industrial-grade speech recognition toolkit.
Star FunASR on GitHub ★