Quickstart Guide

Master the three core workflows of FunASR in under 3 minutes: file transcription, real-time streaming, and API server deployment. Every code snippet is ready to copy and run.

Python 3.8+ GPU 8GB+ (optional, CPU works too) Linux / macOS / Windows

0
Installation

One command to install everything. GPU users should add vllm for a 16x speed boost.

# Basic install (works on both CPU and GPU)
$ pip install funasr

# Full install (API server + vLLM acceleration, recommended for GPU)
$ pip install funasr vllm fastapi uvicorn python-multipart
Tip If you just need a one-off transcription, pip install funasr is enough. For deploying an API server or maximum throughput, go with the full install.

1
File Transcription

Pass in an audio file, get back the full transcript. Supports 50+ languages with built-in emotion and audio event detection.

from funasr import AutoModel

# SenseVoice: 50+ languages, emotion detection, audio events
model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad")
result = model.generate(input="meeting.wav")
print(result[0]["text"])
Output Let's discuss the three topics today. Sounds good. First one is the Q3 plan. Go ahead, we have 30 minutes.
# Batch-process multiple files at once
results = model.generate(input=["file1.wav", "file2.mp3", "file3.flac"])
for r in results:
    print(r["text"])
Supported Formats WAV, MP3, FLAC, AAC, OGG, MP4 video -- virtually every common audio and video format works out of the box, no manual conversion needed.

2
Real-time Streaming ASR

Feed audio from a microphone or stream and get results as you speak. Perfect for live captions, broadcast transcription, and voice assistants.

from funasr import AutoModel

# Paraformer streaming model
model = AutoModel(model="paraformer-zh-streaming", vad_model="fsmn-vad")

# Streaming inference: chunk_size = [5, 10, 5] means 600ms lookahead
chunk_size = [5, 10, 5]
cache = {}

# Simulate streaming input (replace with mic capture in production)
import soundfile as sf
speech, sr = sf.read("meeting.wav")
chunk_stride = chunk_size[1] * 960 # samples per chunk

for i in range(0, len(speech), chunk_stride):
    chunk = speech[i:i+chunk_stride]
    is_final = (i + chunk_stride >= len(speech))
    result = model.generate(
        input=chunk, cache=cache,
        is_final=is_final, chunk_size=chunk_size
    )
    print(result[0]["text"], end="", flush=True)
Live output (updates progressively) Let's discuss the...three topics today. Sounds good. First one is...the Q3 plan.
WebSocket Real-time Service Need browser-based live transcription? FunASR ships with a complete WebSocket server solution. Check the GitHub repository under examples/ for the WebSocket server and browser client code.

3
OpenAI-compatible API Server

Deploy an OpenAI-compatible speech recognition API with a single command. Drop-in replacement for any app already using the Whisper API.

# Install and start the server
$ pip install funasr vllm fastapi uvicorn python-multipart
$ funasr-server --device cuda

# Server starts on port 8899 by default
# Option 1: curl
$ curl -X POST http://localhost:8899/v1/audio/transcriptions \
    -F "file=@audio.wav" \
    -F "model=iic/SenseVoiceSmall"
# Option 2: OpenAI Python SDK (identical to Whisper API)
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8899/v1", api_key="x")
result = client.audio.transcriptions.create(
    model="fun-asr-nano",
    file=open("audio.wav", "rb"),
    response_format="verbose_json"
)
print(result.text)
JSON Response {"text": "Let's discuss the three topics today.", "segments": [{"start": 1.7, "end": 5.5, "text": "Let's discuss the three topics today."}], "duration": 12.1}

+
MCP Server: AI Assistant Integration

With MCP enabled, AI assistants like Claude, Cursor, and Windsurf can call FunASR directly for speech recognition.

# Start the server with MCP support
$ funasr-server --device cuda --enable-mcp

# AI assistants can now invoke speech recognition via MCP protocol
# Works with Claude Desktop, Cursor, Windsurf, and more
What is MCP? MCP (Model Context Protocol) is a standard protocol for AI assistants to invoke external tools. Once enabled, your AI coding assistant can directly "listen to" audio files, unlocking voice-driven development workflows.

Next Steps

Explore more of what FunASR can do