From:Nexdata Date: 2024-08-15
In the research and application of artificial intelligence, acquiring reliable and rich data has become a crucial part of developing high-efficient algorithm. In order to improve the accuracy and robustness of AI models, enterprises and researchers needs various datasets to train system to cope with complicated scenarios in real applications. This makes the progress of collecting and optimizing data crucial and directly affects the final performance of AI.
At present, in the process of speech recognition research, it is found that the speech database established by adults cannot well understand children’s speech, and many problems exist in recognition errors.
Because children’s language has different voice and language characteristics from adults due to its voice and articulation, children’s speech recognition has natural technical difficulties. In addition, children are not good at interacting with machines in a way that they can understand. Whether it is a more friendly interface or a more intelligent voice assistant, the recognition effect is not satisfactory.
In order to solve this problem, Nexdata has developed over 10,000 hours children speech data. It’s is recorded by children, and the recording contents conforms to the characteristics of children’s speech.
50 Hours — American Children Speech Data by Microphone
It is recorded by 219 American children native speakers. The recording texts are mainly storybook, children’s song, spoken expressions, etc. 350 sentences for each speaker. Each sentence contain 4.5 words in average. Each sentence is repeated 2.1 times in average. The recording device is hi-fi Blueyeti microphone. The texts are manually transcribed.
55 Hours — British Children Speech Data by Microphone
It collects 201 British children. The recordings are mainly children textbooks, storybooks. The average sentence length is 4.68 words and the average sentence repetition rate is 6.6 times. This data is recorded by high fidelity microphone. The text is manually transcribed with high accuracy.
464 Hours — Chinese Children Speaking English Speech Data by Mobile Phone
Children read English audio data, covering ages from preschool (3–5 years old) to post-school (6–12 years old) , with children’s speech features; content accurately matches children’s actual scenes of speaking English. It provides data support for children’s smart home, automatic speech recognition and oral assessment in intelligent education scene.
3,255 Hours-Chinese Children Speech data by Mobile phone
Mobile phone captured audio data of Chinese children, with total duration of 3,255 hours. 9,780 speakers are children aged 6 to 12, with accent covering seven dialect areas; the recorded text contains common children languages such as essay stories, numbers, and their interactions on cars, at home, and with voice assistants, precisely matching the actual application scenes. All sentences are manually transferred with high accuracy.
500 Hours — Korean Children Speech Data by Mobile Phone
The dataset is recorded by local Korean children’s personnel. About 1,500 people participated in the recording, with authentic accents. 500 hours of speech data collected by Korean children’s mobile phones can be used for speech recognition/language model training or algorithm research.
If you want to know more details about the datasets or how to acquire, please feel free to contact us: info@nexdata.ai.
Data-driven AI transformation is deeply affecting our ways of life and working methods. The dynamic nature of data is the key for artificial intelligent models to maintain high performance. Through constantly collecting new data and expanding the existing ones, we can help models better cope with new problems. If you have data requirements, please contact Nexdata.ai at [email protected].