en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

Speech Recognition Datasets

Instantly enhance AI model performance with high quality off-the-shelf datasets.

Language

All
33
Arabic
2
Burmese
2
Chinese Dialects
23
English
47
French
7
German
7
Hindi
6
Indonesian
8
Italian
8
Japanese
6
Korean
11
Malay
5
Mandarin
32
Others
28
Portugese
4
Russian
5
Spanish
12
Thai
5
Vietnamese
4

Data Type

All
33
Dialogue
91
Read
129

300 People - Mandarin Chinese and English Bilingual Spotaneous Monologue Smartphone speech dataset

Mandarin Chinese and English Bilingual Spotaneous Monologue Smartphone speech dataset, collected from dialogues based on given topics, covering generic domain. Our dataset was collected from extensive and diversify speakers(300 people in total, ages 18 to 65), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Unscripted monologue Natural Speech Mandarin English Bilingual

548 Hours - Taiwanese Accent Mandarin(China) Real-world Casual Conversation and Monologue speech dataset

Taiwanese Accent Mandarin(China) Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Chinese spoken video voice data Chinese voice data Chinese spoken video data Chinese multimodal data

2,657 Hours - Mandarin(China) Spontaneous Dialogue Smartphone speech dataset

Mandarin(China) Spontaneous Dialogue Smartphone speech dataset, transcribed with text content, timestamp, speaker's ID, gender and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
mandarin dialogue speech data chinese conversional speech data chinese conversional speech dataset chinese conversional audio data

592 People - Mandarin Chinese and Dialects(China) Number Scripted Monologue Smartphone speech dataset

Mandarin Chinese and Dialects(China) Number Scripted Monologue Smartphone speech dataset, collected from monologue based on given sentences which contains numbers, such as date, time, phone number, numbers, currency, and numeric computation, with balanced gender distribution, including Mandarin Chinese, Sichuan dialect, Cantonese. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(592 people), geographicly speaking, enhancing model performance in real and complex tasks.rnQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
digital dialect Mandarin audio data captured by mobile phone digital dialect collection digital audio data

3,881 Hours - Mandarin(China) Real-world Casual Conversation and Monologue speech dataset

Mandarin(China) Real-world Casual Conversation and Monologue speech dataset, covers interview, sports, variety, course, etc, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Colloquial video data interview video data sport video data comprehensive video data educational video data entertainment video data service video data text annotation noise annotation Mandarin conversional video data Mandarin conversional video dataset Mandarin conversional graphical data Mandarin conversional graphical dataset Mandarin conversional recording data Mandarin conversional recording dataset Mandarin conversional visual data Mandarin conversional visual dataset Mandarin conversional tape data Mandarin conversional tape dataset Mandarin commonsense video data Mandarin commonsense video dataset Mandarin commonsense graphical data Mandarin commonsense graphical dataset Mandarin commonsense recording data Mandarin commonsense recording dataset Mandarin commonsense visual data Mandarin commonsense visual dataset Mandarin commonsense tape data Mandarin commonsense tape dataset Mandarin small talk video data Mandarin small talk video dataset Mandarin small talk graphical data Mandarin small talk graphical dataset Mandarin small talk recording data Mandarin small talk recording dataset Mandarin small talk visual data Mandarin small talk visual dataset Mandarin small talk tape data Mandarin small talk tape dataset Mandarin daily talk video data Mandarin daily talk video dataset Mandarin daily talk graphical data Mandarin daily talk graphical dataset Mandarin daily talk recording data Mandarin daily talk recording dataset Mandarin daily talk visual data Mandarin daily talk visual dataset Mandarin daily talk tape data Mandarin daily talk tape dataset Mandarin dialects conversional video data Mandarin dialects conversional video dataset Mandarin dialects conversional graphical data Mandarin dialects conversional graphical dataset Mandarin dialects conversional recording data Mandarin dialects conversional recording dataset Mandarin dialects conversional visual data Mandarin dialects conversional visual dataset Mandarin dialects conversional tape data Mandarin dialects conversional tape dataset Mandarin dialects commonsense video data Mandarin dialects commonsense video dataset Mandarin dialects commonsense graphical data Mandarin dialects commonsense graphical dataset Mandarin dialects commonsense recording data Mandarin dialects commonsense recording dataset Mandarin dialects commonsense visual data Mandarin dialects commonsense visual dataset Mandarin dialects commonsense tape data Mandarin dialects commonsense tape dataset Mandarin dialects small talk video data Mandarin dialects small talk video dataset Mandarin dialects small talk graphical data Mandarin dialects small talk graphical dataset Mandarin dialects small talk recording data Mandarin dialects small talk recording dataset Mandarin dialects small talk visual data Mandarin dialects small talk visual dataset Mandarin dialects small talk tape data Mandarin dialects small talk tape dataset Mandarin dialects daily talk video data Mandarin dialects daily talk video dataset Mandarin dialects daily talk graphical data Mandarin dialects daily talk graphical dataset Mandarin dialects daily talk recording data Mandarin dialects daily talk recording dataset Mandarin dialects daily talk visual data Mandarin dialects daily talk visual dataset Mandarin dialects daily talk tape data Mandarin dialects daily talk tape dataset Mandarin regional language conversional video data Mandarin regional language conversional video dataset Mandarin regional language conversional graphical data Mandarin regional language conversional graphical dataset Mandarin regional language conversional recording data Mandarin regional language conversional recording dataset Mandarin regional language conversional visual data Mandarin regional language conversional visual dataset Mandarin regional language conversional tape data Mandarin regional language conversional tape dataset Mandarin regional language commonsense video data Mandarin regional language commonsense video dataset Mandarin regional language commonsense graphical data Mandarin regional language commonsense graphical dataset Mandarin regional language commonsense recording data Mandarin regional language commonsense recording dataset Mandarin regional language commonsense visual data Mandarin regional language commonsense visual dataset Mandarin regional language commonsense tape data Mandarin regional language commonsense tape dataset Mandarin regional language small talk video data Mandarin regional language small talk video dataset Mandarin regional language small talk graphical data Mandarin regional language small talk graphical dataset Mandarin regional language small talk recording data Mandarin regional language small talk recording dataset Mandarin regional language small talk visual data Mandarin regional language small talk visual dataset Mandarin regional language small talk tape data Mandarin regional language small talk tape dataset Mandarin regional language daily talk video data Mandarin regional language daily talk video dataset Mandarin regional language daily talk graphical data Mandarin regional language daily talk graphical dataset Mandarin regional language daily talk recording data Mandarin regional language daily talk recording dataset Mandarin regional language daily talk visual data Mandarin regional language daily talk visual dataset Mandarin regional language daily talk tape data Mandarin regional language daily talk tape dataset Mandarin vernacular conversional video data Mandarin vernacular conversional video dataset Mandarin vernacular conversional graphical data Mandarin vernacular conversional graphical dataset Mandarin vernacular conversional recording data Mandarin vernacular conversional recording dataset Mandarin vernacular conversional visual data Mandarin vernacular conversional visual dataset Mandarin vernacular conversional tape data Mandarin vernacular conversional tape dataset Mandarin vernacular commonsense video data Mandarin vernacular commonsense video dataset Mandarin vernacular commonsense graphical data Mandarin vernacular commonsense graphical dataset Mandarin vernacular commonsense recording data Mandarin vernacular commonsense recording dataset Mandarin vernacular commonsense visual data Mandarin vernacular commonsense visual dataset Mandarin vernacular commonsense tape data Mandarin vernacular commonsense tape dataset Mandarin vernacular small talk video data Mandarin vernacular small talk video dataset Mandarin vernacular small talk graphical data Mandarin vernacular small talk graphical dataset Mandarin vernacular small talk recording data Mandarin vernacular small talk recording dataset Mandarin vernacular small talk visual data Mandarin vernacular small talk visual dataset Mandarin vernacular small talk tape data Mandarin vernacular small talk tape dataset Mandarin vernacular daily talk video data Mandarin vernacular daily talk video dataset Mandarin vernacular daily talk graphical data Mandarin vernacular daily talk graphical dataset Mandarin vernacular daily talk recording data Mandarin vernacular daily talk recording dataset

41.8 Hours - Mandarin Chinese(China) Preschoolers Scripted Monologue Smartphone and Microphone speech dataset

Mandarin Chinese(China) Preschoolers Scripted Monologue Smartphone and Microphone speech dataset, collected from monologue based on given scripts, covering generic domain, children's songs, storybooks, human-machine interaction, numbers, alphabet. Transcribed with text content, noise and other attributes. Our dataset was collected from extensive and diversify speakers(797 preschoolers, aged 3-5 years old, recorded in hi-fi microphone and smartphone), geographicly speaking, enhancing model performance in real and complex tasks.rnQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Children's Voice Children's Voice Data Chinese Voice Data baby voice infant voice

491 People - Mandarin(China) Commands speech dataset

Mandarin(China) Commands speech dataset, each recording the same corpus with 17 commonly used command words. The proportion of male and female speakers is balanced, covering multiple age groups. The data is recorded by Bluetooth headset, covering the mainstream models in the market. It can be used for the voice assistant, command control, and other application scenarios.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
command words data speech assistant data Car voice data commands speech dataset

2,028 Hours - Mandarin(China) Scripted Monologue Smartphone speech dataset

Mandarin(China) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and control, in-car command and control, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks.rnQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
mandarin speech data Scripted Monologue speech data chinese speech data

303 Hours - Mandarin Chinese and English(China) Mix Scripted Monologue Smartphone speech dataset

Mandarin Chinese and English(China) Mix Scripted Monologue Smartphone speech dataset, collected from monologue based on given Chinese and English Mixed prompts, covering general and human-computer interaction domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(1,113 speakers), geographicly speaking, enhancing model performance in real and complex tasks.rnQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Chinese and English mixed reading voice mixed reading voice data mobile phone collection of voice data

loading

Tailor Your Data Now

Why off-the-shelf Datasets

  • Copyright

    Copyright

    Clear Coyright and Ready to Check
  • Security

    Security

    Properly Authorized Secure to Use
  • Professional

    Professional

    Designed and produced by AI data experts
  • Diversity

    Diversity

    Collected from a varity of real scenes
  • Cost Effective

    Cost Effective

    More Cost-Efficient Than Tailored Data
  • Efficiency

    Efficiency

    Ready-To-Go Deliver in Seconds
7e485c6a-93ae-4d9f-8ad6-4dd3b9e99b05