Self-Hosted OpenAI Whisper API Alternative: Drop-in /v1/audio/transcriptions with FunASR

OpenAI's /v1/audio/transcriptions (the Whisper API) charges per minute, uploads your audio to the cloud, and is rate-limited. If you just need audio turned into text, you can run a server with an identical interface on your own machine — existing code switches over by changing a single base_url. FunASR ships exactly that as funasr-server. Every snippet below is tested.

Start an OpenAI-compatible server in one line

pip install -U funasr
funasr-server --model sensevoice --device cuda     # listens on localhost:8000

You now have the same two endpoints OpenAI exposes: POST /v1/audio/transcriptions and GET /v1/models.

Call it with the official OpenAI SDK (just change base_url)

Your existing OpenAI code is almost unchanged — point base_url at the local server and put any non-empty api_key:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

with open("audio.wav", "rb") as f:
    r = client.audio.transcriptions.create(model="sensevoice", file=f)
print(r.text)
# -> 欢迎大家来体验达摩院推出的语音识别模型   (the Chinese sample above)

Listing models works the same way:

print([m.id for m in client.models.list().data])
# -> ['fun-asr-nano', 'sensevoice', 'paraformer']

Or use curl / any HTTP client

curl http://localhost:8000/v1/audio/transcriptions \
  -F "file=@audio.wav" \
  -F "model=sensevoice"
# -> {"text": "..."}

Because the API is OpenAI-compatible, the Node openai package, LangChain, and most existing SDKs connect to it unchanged.

OpenAI Whisper API vs self-hosted FunASR

OpenAI Whisper APIFunASR self-hosted
Cost$0.006 / minFree (your hardware)
PrivacyAudio leaves your machineStays local
Rate limitsYesNone
Chinese accuracyMediocreHigher (benchmark)
SpeedSenseVoice is non-autoregressive, far faster than Whisper
Interface/v1/audio/transcriptionsIdentical

Which model to pick

Just set model= in the request — no server restart needed.

What changes when migrating from OpenAI

FunASR is Tongyi Lab's open-source, industrial-grade speech recognition toolkit.

Star FunASR on GitHub ★