From:Nexdata Date: 2024-08-14
In the field of machine learning and deep learning, datasets plays an irreplaceable role. No matter it is image data for convolutional neural networks or massive text data for natural language processing, the integrity and diversity of data directly determine the learning results of a model. With the advancement of technology, datasets that collected from specific scenarios have becomes the core strategy for improving model performance.
A voice assistant is an intelligent application that helps users solve problems through smart conversations and instant question-and-answer interactions. Common voice assistants in daily life include "Siri" and "Xiao Du." These voice assistants come equipped with corresponding pronunciation dictionaries, which contain all the speech they can recognize.
A pronunciation dictionary is a dictionary that stores the pronunciation of all words and indicates their pronunciation. By using the pronunciation dictionary, a mapping relationship is established between the acoustic modeling units and the language modeling units, connecting the acoustic model and the language model. This creates a search state space that can be used by the decoder for decoding.
A sentence can be formed by combining several words, and the pronunciation dictionary allows us to obtain the phoneme sequence of the pronunciation of each word. The transition probabilities between adjacent words can be obtained through a language model, while the probability model of phonemes is mainly obtained through an acoustic model. This results in a probability model for a sentence.
In a speech recognition system, the larger the amount of data covered by the pronunciation dictionary, the higher the accuracy of speech recognition. When encountering new vocabulary, these words and their corresponding phonetic transcriptions can be added to the pronunciation dictionary, continuously expanding the vocabulary within it. It can be said that the three main factors for measuring the quality of a pronunciation dictionary are vocabulary size, phonetic labeling, and the accuracy of proofreading.
Currently, due to the need for professional control over the collection, labeling, and cleaning of pronunciation dictionaries, the performance of a speech recognition system can be impacted if there is not a large number of accurate pronunciation dictionaries that cover a wide range of vocabulary.
Nexdata Pronunciation Dictionary Corpus
80,279 Cantonese Pronunciation Dictionary
This pronunciation dictionary collects words with dialect characteristics in Guangdong cantonese regions. Each entry consists of three parts: words, pinyin and tones. The dictionary can be used to provide pronunciation reference for sound recording personnel, research and development of pronunciation recognition technology, etc.
101,702 Japanese Pronunciation Dictionary
The data contains 101,702 entries. All words and pronunciations are produced by Japanese linguists. It can be used in the research and development of Japanese ASR technology.
500,113 English Pronunciation Dictionary
The data contains 500,113 entries. All words and pronunciations are produced by English linguists. It can be used in the research and development of English ASR technology.
444,202 Korean Pronunciation Dictionary
The data contains 444,202 entries. All words and pronunciations are produced by Korean linguists. It can be used in the research and development of Korean ASR technology.
High-quality datasets are the foundation for the success of artificial intelligence. Therefore, all industries need to continue investing in data infrastructure to make sure the accuracy and diversity of data collection. From smart city to precision medicare, from education equality to environment protection, the future potential of AI will binding with data system to provide dynamic for society and economy.