en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

1,420 Hours - Mandarin Chinese(China) Spontaneous Monologue Smartphone speech dataset

Mandarin asr data
Mandarin asr dataset
Mandarin asr collection
Mandarin language data
Mandarin language dataset
Mandarin language collection
Mandarin speech data
Mandarin speech dataset
Mandarin speech collection
Mandarin discuss asr data
Mandarin discuss asr dataset
Mandarin discuss asr collection
Mandarin discuss language data
Mandarin discuss language dataset
Mandarin discuss language collection
Mandarin discuss speech data
Mandarin discuss speech dataset
Mandarin discuss speech collection
Mandarin small talk asr data
Mandarin small talk asr dataset
Mandarin small talk asr collection
Mandarin small talk language collection
Mandarin small talk speech data
Mandarin small talk speech dataset
Mandarin conversational asr data
Mandarin conversational asr dataset
Mandarin conversational asr collection
Mandarin conversational speech data
Mandarin conversational speech dataset
Mandarin chat asr data
Mandarin chat asr dataset
Mandarin chat asr collection
Mandarin chat language dataset
Mandarin chat language collection
Mandarin chat speech data
Mandarin chat speech dataset
Mandarin chat speech collection
Mandarin speech asr data
Mandarin speech asr collection
Mandarin speech language data
Mandarin speech language dataset
Mandarin speech language collection
Mandarin talk asr data
Mandarin talk asr dataset
Mandarin conversation asr dataset

Mandarin Chinese(China) Spontaneous Monologue Smartphone speech dataset, collected from dialogues without given topics, close to casual conversation, covering generic domain. Transcribed with text content, noise and other attributes. Our dataset was collected from extensive and diversify speakers(700 Chinese in total), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Format
16kHz, 16bit, uncompressed wav, mono channel;
Recording condition
Low background noise;
Content category
generic domain(without given topics);
Recording device
Android Smartphone;
Speaker
700 people, 35%male and 65% femal;
Country
China(CHN);
Language(Region) Code
zh-CN;
Language
Mandarin Chinese;
Features of annotation
Transcription text; 4 noise symbols; mainly annotates for near-end speech
Accuracy Rate
Sentence Accuracy Rate (SAR) 95%
Sample Sample
  • Audio

    你觉得我说话语速快吗

  • Audio

    看看到时间了然后我给你发过去

  • Audio

    你这几天你[P]你都几点睡觉呀

  • Audio

    嗯可以了是吧

  • Audio

    然后那个得准备好检查的东西

Recommended DatasetsRecommended Dataset
Tell Us Your Special Needs

By submitting, I agree to the Privacy Protection

5ba4c9bc-c077-496d-8aca-02db343a7e0a

75569c08-04f4-4ee2-82e1-a5ef126833d4