0
Installation

One command to install everything. GPU users should add vllm for a 16x speed boost.

# Basic install (works on both CPU and GPU)

$ pip install funasr

# Full install (API server + vLLM acceleration, recommended for GPU)

$ pip install funasr vllm fastapi uvicorn python-multipart

Tip If you just need a one-off transcription, pip install funasr is enough. For deploying an API server or maximum throughput, go with the full install.

1
File Transcription

Pass in an audio file, get back the full transcript. Supports 50+ languages with built-in emotion and audio event detection.

from funasr import AutoModel

# Fun-ASR-Nano: flagship LLM-ASR for zh/en/ja + Chinese dialects/accents (GPU)

model = AutoModel(model="FunAudioLLM/Fun-ASR-Nano-2512", device="cuda")

result = model.generate(input="meeting.wav")

print(result[0]["text"])

# For 31 languages use the separate checkpoint: FunAudioLLM/Fun-ASR-MLT-Nano-2512

# On CPU, use SenseVoice: AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad")

Output Let's discuss the three topics today. Sounds good. First one is the Q3 plan. Go ahead, we have 30 minutes.

# Batch-process multiple files at once

results = model.generate(input=["file1.wav", "file2.mp3", "file3.flac"])

for r in results:

    print(r["text"])

Supported Formats WAV, MP3, FLAC, AAC, OGG, MP4 video -- virtually every common audio and video format works out of the box, no manual conversion needed.

2
Real-time Streaming ASR

Feed audio from a microphone or stream and get results as you speak. Perfect for live captions, broadcast transcription, and voice assistants.

Which model The flagship Fun-ASR-Nano (LLM-ASR) is an offline model and the default for file transcription and the OpenAI-compatible API (sections 1 & 3). For true low-latency, speak-as-you-go streaming, FunASR ships a dedicated streaming model, paraformer-zh-streaming — shown below. One industrial-grade FunASR stack; pick the model per use case.

from funasr import AutoModel

# Paraformer streaming model

model = AutoModel(model="paraformer-zh-streaming", vad_model="fsmn-vad")

# Streaming inference: chunk_size = [5, 10, 5] means 600ms lookahead

chunk_size = [5, 10, 5]

cache = {}

# Simulate streaming input (replace with mic capture in production)

import soundfile as sf

speech, sr = sf.read("meeting.wav")

chunk_stride = chunk_size[1] * 960  # samples per chunk

for i in range(0, len(speech), chunk_stride):

    chunk = speech[i:i+chunk_stride]

    is_final = (i + chunk_stride >= len(speech))

    result = model.generate(

        input=chunk, cache=cache,

        is_final=is_final, chunk_size=chunk_size

    )

    print(result[0]["text"], end="", flush=True)

Live output (updates progressively) Let's discuss the...three topics today. Sounds good. First one is...the Q3 plan.

WebSocket Real-time Service Need browser-based live transcription? FunASR ships with a complete WebSocket server solution. Check the GitHub repository under examples/ for the WebSocket server and browser client code.

3
OpenAI-compatible API Server

Deploy an OpenAI-compatible speech recognition API with a single command. Drop-in replacement for any app already using the Whisper API.

# Install and start the server

$ pip install funasr vllm fastapi uvicorn python-multipart

$ funasr-server --device cuda

# Server starts on port 8899 by default

# Option 1: curl

$ curl -X POST http://localhost:8899/v1/audio/transcriptions \

    -F "file=@audio.wav" \

    -F "model=iic/SenseVoiceSmall"

# Option 2: OpenAI Python SDK (identical to Whisper API)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8899/v1", api_key="x")

result = client.audio.transcriptions.create(

    model="fun-asr-nano",

    file=open("audio.wav", "rb"),

    response_format="verbose_json"

)

print(result.text)

JSON Response {"text": "Let's discuss the three topics today.", "segments": [{"start": 1.7, "end": 5.5, "text": "Let's discuss the three topics today."}], "duration": 12.1}

+
MCP Server: AI Assistant Integration

With MCP enabled, AI assistants like Claude, Cursor, and Windsurf can call FunASR directly for speech recognition.

# Start the server with MCP support

$ funasr-server --device cuda --enable-mcp

# AI assistants can now invoke speech recognition via MCP protocol

# Works with Claude Desktop, Cursor, Windsurf, and more

What is MCP? MCP (Model Context Protocol) is a standard protocol for AI assistants to invoke external tools. Once enabled, your AI coding assistant can directly "listen to" audio files, unlocking voice-driven development workflows.

Quickstart Guide

0
Installation

1
File Transcription

2
Real-time Streaming ASR

3
OpenAI-compatible API Server

+
MCP Server: AI Assistant Integration

Next Steps

GitHub Repository

Model Gallery

Ecosystem

vs Whisper

Quickstart Guide

0Installation

1File Transcription

2Real-time Streaming ASR

3OpenAI-compatible API Server

+MCP Server: AI Assistant Integration

Next Steps

GitHub Repository

Model Gallery

Ecosystem

vs Whisper

0
Installation

1
File Transcription

2
Real-time Streaming ASR

3
OpenAI-compatible API Server

+
MCP Server: AI Assistant Integration