Speech-to-Text with Word/Character-Level Timestamps in Python — Millisecond-Accurate

Many use cases need more than "what was said" — they need to know when each word was spoken: karaoke-style highlighting, click-a-word-to-seek transcripts, subtitle alignment, video-editing search. Whisper's word timestamps are bolted on (DTW-based, variable accuracy). FunASR's Paraformer emits character-level timestamps natively: every character comes with [start_ms, end_ms] in a single call. Here is real measured output.

Transcribe with timestamps (real output)

pip install funasr

from funasr import AutoModel

model = AutoModel(model="paraformer-zh", vad_model="fsmn-vad", disable_update=True)
res = model.generate(input="audio.wav")

text = res[0]["text"]            # 欢 迎 大 家 来 体 验 ...
timestamp = res[0]["timestamp"]  # [[880, 1120], [1120, 1360], ...]  per-character start/end ms

timestamp is a list of [start_ms, end_ms] pairs, one per character in the text.

Pair characters with timestamps (real)

chars = text.split()          # Paraformer's Chinese output is space-separated characters
for ch, (start, end) in zip(chars, timestamp):
    print(f"{ch}  {start}-{end}ms")

Actual output:

CharSpan (ms)
880 – 1120
1120 – 1360
1380 – 1540
1540 – 1780
1780 – 2020
2020 – 2180
2180 – 2420

All 19 characters of the sentence get a millisecond-accurate start/end.

What you can build

Why Paraformer for timestamps

FunASR ParaformerWhisper
Word/char timestamps✅ native, single callbolt-on (--word_timestamps, DTW)
Accuracynon-autoregressive + CIF alignment, stableimplementation-dependent
Chinesecharacter-level, CER 10.18%~20%
Licenseopen-source, commercial-friendlyopen-source

For higher Chinese accuracy, default to the flagship Fun-ASR-Nano; for the full Chinese walkthrough see Chinese speech recognition; for long-audio segmentation see VAD.

The whole FunASR stack is open-source (MIT) — character timestamps, ASR, VAD, punctuation, speaker, LLM-ASR (flagship Fun-ASR-Nano), ready to use. If it helps, a GitHub Star supports the project 👇

⭐ Star FunASR

Also star:SenseVoice · Fun-ASR · FunClip

Related posts