Transcribe Audio from the Command Line with FunASR (text / JSON / SRT)

FunASR Blog · 2026-06-18 · Tutorial

Don't want to write Python? FunASR ships a command-line tool that turns audio into text, JSON, or SRT subtitles right in your terminal. Local, free, and especially strong on Chinese. Every command below is tested.

Install

pip install -U torch torchaudio
pip install -U funasr   # recommended funasr >= 1.3.26

Simplest: print text

funasr audio.wav

Prints the transcript to the terminal. Defaults to SenseVoice (non-autoregressive, very fast, 50+ languages).

Generate SRT subtitles

funasr audio.wav -f srt -o ./subs

Writes ./subs/audio.srt. Add --spk for speaker-split cues with real timestamps:

funasr meeting.wav --spk -f srt -o ./subs

1
00:00:02,919 --> 00:00:08,169
Hi everyone, let's talk about the Q3 plan.

2
00:00:10,029 --> 00:00:18,550
Quick status update - core features are 80% done.

Structured JSON

funasr audio.wav -f json

{
  "text": "Hi everyone let's talk about the Q3 plan...",
  "file": "audio.wav",
  "model": "sensevoice",
  "language": "auto",
  "audio_duration_s": 59.52,
  "processing_s": 2.17
}

Common options

Command	What it does
`-f text/json/srt/tsv`	Output format (default text)
`--spk`	Speaker diarization (who spoke when)
`--model sensevoice/paraformer/fun-asr-nano`	Pick a model (fun-asr-nano = highest accuracy)
`--hotwords "term,jargon"`	Hotwords to boost rare terms
`-o ./out`	Output directory
`funasr a.wav b.wav`	Transcribe multiple files at once

Deploy as an API server (OpenAI-compatible)

funasr-server --device cuda     # localhost:8000, POST /v1/audio/transcriptions

Call it with any OpenAI SDK:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")
r = client.audio.transcriptions.create(model="sensevoice", file=open("audio.wav","rb"))
print(r.text)

Why the FunASR CLI

Local, free, private - no internet, no API key.
Fast: SenseVoice is non-autoregressive, far faster than Whisper (benchmark) - real-time even on CPU.
Stronger on Chinese + 50+ languages; subtitles, speakers, hotwords, JSON out of the box.

FunASR is Tongyi Lab's open-source, industrial-grade speech recognition toolkit.

Star FunASR on GitHub ★