Punctuation Restoration in Python — Add Punctuation to Unpunctuated Text / ASR Output

2026-06-23 · FunASR Team

Many speech recognition (ASR) models output unpunctuated text — one long run of words that is hard to read. Punctuation restoration adds the commas, periods, and question marks back so the transcript is readable. FunASR's ct-punc is an open-source punctuation-restoration model that works for both Chinese and English, adding punctuation to any text in three lines of Python. Everything below is real measured output.

Punctuation restoration in three lines (real output)

pip install funasr

from funasr import AutoModel

model = AutoModel(model="ct-punc", disable_update=True)
print(model.generate(input="the meeting is at 3 pm please bring your laptop and the report")[0]["text"])
# The meeting is at 3 pm, please bring your laptop and the report.

The input is a continuous string with no punctuation; the output gets a comma, a period, and a capitalized first letter.

Real Chinese & English examples

Input (no punctuation)	Output (ct-punc)
the meeting is at 3 pm please bring your laptop and the report	The meeting is at 3 pm, please bring your laptop and the report.
我们都是木头人不许说话不许动	我们都是木头人，不许说话，不许动。
今天天气怎么样我想出去走走你要一起吗	今天天气怎么样，我想出去走走，你要一起吗？

So: English gets commas, periods, and a capitalized first letter; Chinese gets ，。？ — and it even picks a question mark for questions. One model handles both languages.

The most common use: punctuate ASR output

The biggest use of punctuation restoration is cleaning up speech-to-text output. FunASR's ASR models let you attach ct-punc in one line so the transcript comes out punctuated:

from funasr import AutoModel

# ASR + VAD + punctuation in one call (Chinese Paraformer here)
model = AutoModel(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc")
result = model.generate(input="audio.wav")
print(result[0]["text"])   # output already has punctuation

You can also use ct-punc standalone on any text — not just speech. Old subtitles, OCR output, chat logs: anything unpunctuated can be punctuated.

Why ct-punc

	ct-punc (FunASR)
Languages	Chinese + English (one bilingual model)
Usage	3 lines of Python on raw text; or attach to ASR in one line
Punctuation	，。？ etc.; also capitalizes the first letter in English
License	open-source, commercial-friendly

For full speech pipelines, see Chinese speech recognition, VAD / silence removal, and speaker diarization; to pick an ASR model, see the model selection guide (default recommendation: the flagship Fun-ASR-Nano).

The whole FunASR stack is open-source (MIT) — punctuation restoration, ASR, VAD, speaker, emotion, LLM-ASR (flagship Fun-ASR-Nano), ready to use. If it helps, a GitHub Star supports the project 👇

⭐ Star FunASR

Also star:SenseVoice · Fun-ASR · FunClip

Punctuation Restoration in Python — Add Punctuation to Unpunctuated Text / ASR Output

Punctuation restoration in three lines (real output)

Real Chinese & English examples

The most common use: punctuate ASR output

Why ct-punc

Related posts