Thai Speech Data

From:Nexdata Date: 08/14/2024

➤ Thai speech recognition research

The era of data-driven artificial intelligence has arrived. The quality of data directly affects the effectiveness and intelligence of the model. In this wave of technological change, datasets in various vertical fields are constantly emerging to meet the needs of machine learning in different scenarios. Whether it is computer vision, natural language processing or behavioral analysis, various datasets contain huge commercial value and technical potential.

Among the diverse languages seeking integration into this technology, Thai holds a significant place. Thai speech recognition has been a focal point of research and development, driven by the growing demand for localized and personalized user experiences.

➤ Thai speech recognition: progress & challenges

Over the past few years, Thai speech recognition technology has witnessed remarkable advancements, largely due to the availability of extensive linguistic data. The foundation of any speech recognition system lies in its dataset, and Thai is no exception. The abundance of voice data from various sources, including social media, podcasts, and recorded conversations, has played a pivotal role in training machine learning algorithms. As a result, Thai speech recognition systems have achieved unprecedented accuracy and fluency.

However, this progress is not devoid of challenges. The linguistic complexity of Thai poses hurdles in developing accurate recognition models. The language is tonal and features a unique script, demanding a deep understanding of its phonetics and syntax. Acquiring and annotating precise data for Thai speech recognition remains an ongoing challenge. Moreover, ensuring the inclusivity of regional accents and dialects further complicates the data collection process.

➤ Thai speech data collection

Nexdata Thai Speech Datasets

203 Hours – Thai Speech Data by Mobile Phone_Reading

Thai speech data (reading) is collected from 498 Thailand native speakers and is recorded in quiet environment. The recording is rich in content, covering multiple categories such as econimics, entertainment, news, figure, and oral. Around 400 sentences for each speaker. The valid data volumn is 203 hours. All texts are manual transcribed with high accuray.

1,077 Hours - Thai Conversational Speech Data by Telephone

The 1,077 Hours - Thai Conversational Speech Data involved 1,986 native speakers, developed with proper balance of gender ratio, Speakers would choose a few familiar topics out of the given list and start conversations to ensure dialogues' fluency and naturalness. The recording devices are various mobile phones. The audio format is 8kHz, 8bit, and all the speech data was recorded in quiet indoor environments. All the speech audio was manually transcribed with text content, the start and end time of each effective sentence, and speaker identification.

The progress in the AI field cannot leave the credit of data. By improving the quality and diversity of datasets we can better unleash the potential of artificial intelligence, promote its applications of all walks of life. Only by continuously improving the data system, AI technology can better respond to the fast changing data requirements from market. If you have data requirements, please contact Nexdata.ai at [email protected].

Thai Speech Data

Recent

Nexdata Announces Full Operation of World-Leading Embodied Intelligence Data Factory

Case Study: Multi-View Data Collection Project

Case Study: COT-VLA Robotic Arm Annotation Project

Previous

How Data Empowers Multimodal Machine Learning

Next

The Role of Datasets in Text-to-Speech Technology