Punctuation Restoration in Python — Add Punctuation to Unpunctuated Text / ASR Output

Many speech recognition (ASR) models output unpunctuated text — one long run of words that is hard to read. Punctuation restoration adds the commas, periods, and question marks back so the transcript is readable. FunASR's ct-punc is an open-source punctuation-restoration model that works for both Chinese and English, adding punctuation to any text in three lines of Python. Everything below is real measured output.

Punctuation restoration in three lines (real output)

pip install funasr

from funasr import AutoModel

model = AutoModel(model="ct-punc", disable_update=True)
print(model.generate(input="the meeting is at 3 pm please bring your laptop and the report")[0]["text"])
# The meeting is at 3 pm, please bring your laptop and the report.

The input is a continuous string with no punctuation; the output gets a comma, a period, and a capitalized first letter.

Real Chinese & English examples

Input (no punctuation)Output (ct-punc)
the meeting is at 3 pm please bring your laptop and the reportThe meeting is at 3 pm, please bring your laptop and the report.
我们都是木头人不许说话不许动我们都是木头人,不许说话,不许动。
今天天气怎么样我想出去走走你要一起吗今天天气怎么样,我想出去走走,你要一起吗?

So: English gets commas, periods, and a capitalized first letter; Chinese gets ,。? — and it even picks a question mark for questions. One model handles both languages.

The most common use: punctuate ASR output

The biggest use of punctuation restoration is cleaning up speech-to-text output. FunASR's ASR models let you attach ct-punc in one line so the transcript comes out punctuated:

from funasr import AutoModel

# ASR + VAD + punctuation in one call (Chinese Paraformer here)
model = AutoModel(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc")
result = model.generate(input="audio.wav")
print(result[0]["text"])   # output already has punctuation

You can also use ct-punc standalone on any text — not just speech. Old subtitles, OCR output, chat logs: anything unpunctuated can be punctuated.

Why ct-punc

ct-punc (FunASR)
LanguagesChinese + English (one bilingual model)
Usage3 lines of Python on raw text; or attach to ASR in one line
Punctuation,。? etc.; also capitalizes the first letter in English
Licenseopen-source, commercial-friendly

For full speech pipelines, see Chinese speech recognition, VAD / silence removal, and speaker diarization; to pick an ASR model, see the model selection guide (default recommendation: the flagship Fun-ASR-Nano).

The whole FunASR stack is open-source (MIT) — punctuation restoration, ASR, VAD, speaker, emotion, LLM-ASR (flagship Fun-ASR-Nano), ready to use. If it helps, a GitHub Star supports the project 👇

⭐ Star FunASR

Also star:SenseVoice · Fun-ASR · FunClip

Related posts