From:Nexdata Date: 2024-08-15
Speech recognition technology enables computers to understand human speech, thus supporting a variety of voice interaction scenarios, such as mobile phone applications, human-vehicle collaboration, robot dialogue, voice transcription, etc.
However, in these scenarios, the input for speech recognition is not always a single language, and sometimes there is a mixture of multiple languages. For example, in Chinese scenes, we often use some English terminology to express meaning, which brings new challenges to speech recognition technology.
Code-switch Speech Recognition Challenges
1. Similar pronunciation
Chinese and English speech recognition requires a single model to learn multiple speech sounds, and pronunciations that are similar but have different meanings usually lead to increased model complexity and computation. Since it needs to distinguish and process similar pronunciations in different languages, it is necessary to distinguish different modeling units according to different languages when modeling the model.
2. Scarce Training Data
Chinese-English mixed data is less than single-language data. At present, open source Chinese speech recognition data sets such as WenetSpeech and English speech recognition data set Giga Speech have reached the 10,000-hour level, but the mixed open source Chinese and English speech recognition data are only SEAME and TAL_CSASR two open source data.
Nexdata Code-switch Speech Recognition Data Solutions
1,535 Hours - Mixed Speech with Chinese and English Data by Mobile Phone
The 1,535 Hours - Mixed Speech with Chinese and English Data by Mobile Phone is recorded by 3972 Chinese native speakers with accents covering seven major dialect areas. The recorded text is a mixture of Chinese and English sentences, covering general scenes and human-computer interaction scenes. It is rich in content and accurate in transcription. It can be used for improving the recognition effect of the speech recognition system on Chinese-English mixed reading speech.
303 Hours - Mixed Speech with Chinese and English Data by Mobile Phone
The data is recorded by 1113 Chinese native speakers with accents covering seven major dialect areas. The recorded text is a mixture of Chinese and English sentences, covering general scenes and human-computer interaction scenes. It is rich in content and accurate in transcription. It can be used for improving the recognition effect of the speech recognition system on Chinese-English mixed reading speech.
300 Hours - Mixed Speech with Korean and English Data by Mobile Phone
The data is recorded by Korean native speakers . The recorded text is a mixture of Korean and English sentences, covering general scenes and human-computer interaction scenes. It is rich in content and accurate in transcription. It can be used for improving the recognition effect of the speech recognition system on Korean-English mixed reading speech.