From:Nexdata Date: 2024-08-15
The essence of ASR is a pattern recognition system, including three basic units: feature extraction, pattern matching, and reference patterns. Feature extraction is applied to the labeling method of attribute classification. First, the input speech is preprocessed, and then the characteristics of the speech are extracted. On this basis, the template required for speech recognition is established, and then the original speech template stored in the computer is Compare with the characteristics of the input speech signal to find out the best template that matches the input speech.
According to the definition of this template, by looking up the table, you can get the best recognition result of the computer. This best result is directly related to the selection of features, the quality of the voice model, and the accuracy of the template. It requires continuous training of a large number of audio dataset to obtain.
Therefore, the success of speech recognition technology largely depends on large-scale high-quality audio datasets. Nexdata has accumulated multi-channel, multi-environment, and multi-type audio dataset, covering in more than 60 languages.
344 People - American English Audio Dataset by Mobile Phone_Guiding
The data set contains 344 American English speakers' Audio Dataset, all of whom are American locals. 50 sentences for each speaker. The valid data is 9.7 hours. It is recorded in quiet environment. The contents cover in-car scenario, smart home and speech assistant.
199 Hours-British English Audio Dataset by Mobile Phone_Reading
The data set contains 346 British English speakers' Audio Dataset, all of whom are English locals. Around 392 sentences of each speaker. The valid audio dataset is 199 hours. Recording environment is quiet. Recording contents contain various categories like economics, news, entertainment, commonly used spoken language, letter, figure, etc.
351 People – German Audio Dataset by Mobile Phone_Guiding
351 People – German Audio Dataset by Mobile Phone_Guiding were collected and recorded by 351 German native speakers with authentic accents. The recorded text is designed by professional language experts and is rich in content, covering multiple categories such as general purpose, interactive, vehicle-mounted and household commands. The recording environment is quiet and without echo.
401 People - French Audio Dataset by Mobile Phone_Guiding
401 speakers participate in this recording. 50 sentences for each speaker, total 10.9 hours. Recording texts include in-car scene, smart home, smart speech assistant. Texts are accurate after manually transcribed.
397 People - Hindi Audio Dataset by Mobile Phone_Guiding
397 People - Hindi Audio Dataset by Mobile Phone_Guiding is recorded by 397 Indian with authentic accent, 50 sentences for each speaker, total 8.6 hours. The recording content involves car scene, smart home, intelligent voice assistant.
1,002 Hours - Russian Audio Dataset by Mobile Phone
1960 Russian native speakers participated in the recording with authentic accent. The recorded script is designed by linguists and cover a wide range of topics including generic, interactive, in-vehicle and home.
End
If you want to know more details about the audio datasets or how to acquire, please feel free to contact us: [email protected].