en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

200,475 Sentences - Chinese Text Normalization Dataset for TTS & NLP

Chinese text normalization dataset
Mandarin TTS corpus
Text normalization for speech synthesis
Symbol-to-character annotation dataset
Mandarin text preprocessing data

This dataset comprises 200,475 Mandarin Chinese sentences annotated for text normalization, transforming special symbols and Arabic numerals into Chinese characters. It is ideal for training and evaluating Text-to-Speech (TTS) systems and Natural Language Processing (NLP) models.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Data content
200,475 sentences of text were transcribed in Chinese characters;
Data scale
200,475 original texts with 457,832 annotations;
Content source
Sentences extracted from various types of news, articles, novels, etc.
Language
Chinese;
Annotation
Annotate the special symbols and Arabic numerals in the sentences as Chinese characters;
Applications
TTS, Text normalization;
Sample Sample
  • 200,475 Sentences - Chinese Text Normalization Dataset for TTS & NLP
Recommended DatasetsRecommended Dataset
Tell Us Your Special Needs

By submitting, I agree to the Privacy Protection

4ab3694f-031d-4773-ae40-248608e6e0cd

2aacdddb-2085-4d4b-81b1-eb45d3a65ae1