en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

Speech Synthesis Datasets

Instantly enhance AI model performance with high quality off-the-shelf datasets.

Voice Type

All
18
Average Tone
9
Emotion
1
Female
5
Front-end Text
3
Male
1
Others
2

Language

All
18
Chinese Dialects
3
Chinese‐English Code‐mixing
1
English
7
Japanese
2
Mandarin
9
Others
3

200,475 Sentences - Chinese Text Normalization Data

200,475 Sentences - Chinese Text Normalization Data. Annotate the special symbols and Arabic numerals in the sentences as Chinese characters.
TN TTS Text Normalization

2 People - Australian English Average Tone Speech Synthesis Corpus

2 People - Australian English Average Tone Speech Synthesis Corpus. It is recorded by native Australian, with authentic accent. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
English Tts Australian English Average Tone

2 People - Chinese Natural Conversation Speech Synthesis Corpus

2 People - Chinese Natural Conversation Speech Synthesis Corpus. It is recorded by Chinese native speaker, natural conversation style. phonemes and tones are balanced. Professional phonetician participates in the annotation, and annotate secondary language, Secondary Language Annotation: Inhalation: V; Pause: P; Hesitation: T; Mouth clicking: M; Drawl: D; Cough: C; Laughter: L; Stutter repetition: R; Inversion: I; Modal particle: S (Modal particles include "ah", "oh", "wow", "right?", "what?", "well" etc.). It precisely matches with the research and development needs of the speech synthesis.
Natural conservation Secondary language TTS

12 Hours - Chinese Mandarin Synthesis Corpus-Female, Entertainment anchor Style, Multi-emotional

12 Hours - Chinese Mandarin Entertainment anchor Style Multi-emotional Synthesis Corpus. It is recorded by Chinese native speaker. six emotional text+modal particles, phonemes and tones are balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
Synthesis Corpus TTS Mandarin Multi-emotional Entertainment anchor

10.4 Hours - Japanese Synthesis Corpus-Female

10.4 Hours - Japanese Synthesis Corpus-Female. It is recorded by Japanese native speaker, with authentic accent. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
Japanese TTS Female

19.46 Hours - American English Speech Synthesis Corpus-Female

Female audio data of American English,. It is recorded by American English native speaker, with authentic accent and sweet sound. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
American English TTS

2 People - Japanese Average Tone Speech Synthesis Corpus

2 People - Japanese Average Tone Speech Synthesis Corpus. It is recorded by native Japanese, with authentic accent. Contains news and colloquial style general corpus,the phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
Japanese Tts Average Tone

2 People - Korean Average Tone Speech Synthesis Corpus

2 People - Korean Average Tone Speech Synthesis Corpus. It is recorded by korean native , with authentic accent. Contains news and colloquial style general corpus,the phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
Korean Tts Average Tone

2 People - New Zealand English Average Tone Speech Synthesis Corpus

2 People - New Zealand English Average Tone Speech Synthesis Corpus. It is recorded by native New Zealanders, with authentic accent. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
English Tts New Zealand English Average Tone

loading

Tailor Your Data Now

Why off-the-shelf Datasets

  • Copyright

    Copyright

    Clear Coyright and Ready to Check
  • Security

    Security

    Properly Authorized Secure to Use
  • Professional

    Professional

    Designed and produced by AI data experts
  • Diversity

    Diversity

    Collected from a varity of real scenes
  • Cost Effective

    Cost Effective

    More Cost-Efficient Than Tailored Data
  • Efficiency

    Efficiency

    Ready-To-Go Deliver in Seconds
63ac1880-1ac3-4396-9319-1746ef2f0f06