FunASR Blog

Speech recognition deployment, tutorials and best practices

2026-06-23

Speech-to-Text with Word/Character-Level Timestamps in Python

FunASR Paraformer gives native character timestamps: every char has [start_ms, end_ms] in one call. Build word highlighting, click-to-seek, subtitle alignment. Real output + pairing code.

2026-06-23

Punctuation Restoration in Python — Add Punctuation to Text / ASR Output

FunASR ct-punc: open-source bilingual (zh+en) punctuation restoration. 3 lines of Python to add ，。？ (and capitalize English), or attach to ASR in one line. Real output.

2026-06-22

Chinese (Mandarin) Speech Recognition in Python — Fast & Accurate with FunASR

Purpose-built for Chinese: default flagship Fun-ASR-Nano (CER 8.06%), with SenseVoice (7.81%) / Paraformer (10.18%, timestamps/hotwords) for CPU — all far better than Whisper (~20%).

2026-06-22

Self-Hosted Speech-to-Text — Free Alternative to Google / AWS / Azure Cloud Speech APIs

Open-source (MIT), local inference, no per-minute billing, audio stays on your network, strong Chinese, OpenAI-compatible (migrate via base_url). Runnable code + FunASR vs cloud comparison + cost analysis.

2026-06-22

Voice Activity Detection in Python — Detect Speech, Remove Silence, Split Audio

FunASR fsmn-vad returns millisecond speech spans in 3 lines. Measured: a 13s clip split into 2 segments in 0.12s, 18% silence removed. Trim silence, split long audio, preprocess Whisper to cut hallucinations.

2026-06-22

Japanese Speech Recognition in Python — SenseVoice: Transcription + Punctuation + Emotion in One Pass

Real side-by-side on one Japanese clip: SenseVoice writes 転売 correctly and adds punctuation; Whisper-small gives 天売 with none. Native ja support + auto language ID + emotion/events, 3 lines of Python.

2026-06-23

Self-Hosted Deepgram / AssemblyAI Alternative

Open-source, free, self-hosted, audio stays local, Chinese-leading, with an OpenAI-compatible API — replace per-minute cloud STT by changing base_url.

2026-06-23

Which FunASR Model? Nano vs MLT-Nano vs SenseVoice vs Paraformer

A decision table with code: flagship Nano for zh/en/ja plus Chinese dialects/accents, separate MLT-Nano for 31 languages, and CPU choices SenseVoice or Paraformer.

2026-06-22

Lightweight Speech Recognition: Chinese ASR in ~250MB on CPU

One binary + a 254MB q8 model, no GPU/Python, 0.16s on CPU, 7.99% CER — smaller than whisper.cpp small and ~3x more accurate.

2026-06-21

Cantonese Speech Recognition in Python — SenseVoice Keeps Real Cantonese (Whisper Doesn't)

Real side-by-side on one Cantonese clip: SenseVoice keeps 呢/唔/嘅, Whisper rewrites to Mandarin. Native yue support + auto language ID, 3 lines of Python.

2026-06-21

FunASR vs faster-whisper: Chinese & Cantonese Compared

faster-whisper mislabels Cantonese as Mandarin + Japanese homophone errors; SenseVoice handles Cantonese natively, ~2.7x lower CER on Chinese.

2026-06-20

FunASR on llama.cpp — a whisper.cpp Alternative for Chinese ASR (CPU, no Python)

One self-contained binary, built-in VAD, any audio — download and transcribe Chinese; ~2.7x more accurate than whisper.cpp on CPU.

2026-06-18

Speech to Text in Python with FunASR

Transcribe audio in a few lines of Python — timestamps, speakers, batching. Local, free, no API key.

2026-06-18

Auto-Generate Subtitles (SRT & VTT) from Audio or Video with FunASR

One command for SRT, Python for VTT too, with speaker labels and real timestamps. Local, free, strong on Chinese.

2026-06-18

Self-Hosted OpenAI Whisper API Alternative with FunASR

funasr-server exposes an OpenAI-compatible /v1/audio/transcriptions; the OpenAI SDK works by changing only base_url. Local, free, private.

2026-06-18

Transcribe Audio from the Command Line with FunASR

One command -> text/SRT/JSON, --spk for speakers; plus funasr-server for an OpenAI-compatible API.

2026-06-17

Transcribe Long Audio with FunASR: Hours in One Call

Whisper caps at 30s; FunASR ingests any length via built-in VAD - 13 min in 4.3s (186x).

2026-06-17

Real-Time Streaming Speech-to-Text with FunASR

~600ms low-latency streaming ASR with chunks + cache, plus the 2-pass (streaming + offline) best practice.

2026-06-17

Beyond Transcription: Language, Emotion & Audio Events with SenseVoice

Transcription + language + emotion + audio events in one pass — the 4-in-1 Whisper cannot do.

2026-06-17

Speaker Diarization with FunASR: Who Spoke When

Transcription + speaker labels + timestamps in one generate() call. A pyannote+Whisper alternative, no HF gated access.

2026-06-16

FunASR vs Whisper: Real Chinese ASR Benchmark

Measured on 184 Chinese files (H100): SenseVoice 169.6x, 7.81% CER — full speed & accuracy data.

2026-06-16

Fun-ASR-Nano Guide: 800M End-to-End ASR LLM

Flagship for zh/en/ja plus 7 Chinese dialect groups and 26 accents; choose MLT-Nano for 31 languages.

2026-06-16

SenseVoice Deployment Guide: 15x Faster Than Whisper

Multilingual ASR in 3 lines, with language/emotion/event tags, VAD, GPU/CPU.

More: Quickstart · Models

FunClip v2.1.0: The First Versioned Local AI Video-Clipping Release

FunASR v1.3.28: Reliable Realtime ASR and Better Subtitles

FunASR Blog

Speech-to-Text with Word/Character-Level Timestamps in Python

Punctuation Restoration in Python — Add Punctuation to Text / ASR Output

Chinese (Mandarin) Speech Recognition in Python — Fast & Accurate with FunASR

Self-Hosted Speech-to-Text — Free Alternative to Google / AWS / Azure Cloud Speech APIs

Voice Activity Detection in Python — Detect Speech, Remove Silence, Split Audio

Japanese Speech Recognition in Python — SenseVoice: Transcription + Punctuation + Emotion in One Pass

Self-Hosted Deepgram / AssemblyAI Alternative

Which FunASR Model? Nano vs MLT-Nano vs SenseVoice vs Paraformer

Lightweight Speech Recognition: Chinese ASR in ~250MB on CPU

Cantonese Speech Recognition in Python — SenseVoice Keeps Real Cantonese (Whisper Doesn't)

FunASR vs faster-whisper: Chinese & Cantonese Compared

FunASR on llama.cpp — a whisper.cpp Alternative for Chinese ASR (CPU, no Python)

Speech to Text in Python with FunASR

Auto-Generate Subtitles (SRT & VTT) from Audio or Video with FunASR

Self-Hosted OpenAI Whisper API Alternative with FunASR

Transcribe Audio from the Command Line with FunASR

Transcribe Long Audio with FunASR: Hours in One Call

Real-Time Streaming Speech-to-Text with FunASR

Beyond Transcription: Language, Emotion & Audio Events with SenseVoice

Speaker Diarization with FunASR: Who Spoke When

FunASR vs Whisper: Real Chinese ASR Benchmark

Fun-ASR-Nano Guide: 800M End-to-End ASR LLM

SenseVoice Deployment Guide: 15x Faster Than Whisper