Unleashing the Potential of Lip Language through Machine Learning

From：Nexdata Date： 2024-08-14

➤ Indonesian speech recognition

In the progress of constructing intelligent system, the quality of the training datasets are more important than algorithm itself. For coping with different challenges in complex scenarios, researchers need to collect and annotate different types of data to improve the capabilities of AI system. Nowadays, every industries are exploring constantly how to use data-driven technology to realize smarter business processes and decision-making systems.

Indonesian is one of the most widely spoken languages globally, with over 270 million speakers spread across the archipelago. As technology becomes increasingly integrated into everyday life, it is crucial to enable Indonesian speakers to communicate with and command devices using their native language. However, developing a robust speech recognition system for Indonesian presents unique challenges due to its phonological complexity and rich morphological structure.

Training data is the backbone of any machine learning model, and speech recognition systems are no exception. High-quality training data plays a pivotal role in the accuracy and performance of these systems. In the case of Indonesian speech recognition, having a diverse and extensive dataset of spoken language is essential. This dataset should encompass a wide range of accents, dialects, and speaking styles to ensure the model's ability to adapt to variations in natural speech.

➤ Challenges in Indonesian speech data

Obtaining sufficient and accurate training data for Indonesian speech recognition is not without challenges. Firstly, the vast linguistic diversity across Indonesia means that the dataset must capture the nuances of various regional accents and linguistic variations. Secondly, privacy concerns and ethical considerations require developers to anonymize and secure the data while complying with data protection regulations.

Indonesian Speech Datasets

359 Hours-Indonesian Speech Data by Mobile Phone

Indonesia speech data (reading) is collected from 496 Indonesian native speakers and is recorded in quiet environment. The recording is rich in content, covering multiple categories such as econimics, entertainment, news, figure, letter, and oral. Around 400 sentences for each speaker. The valid data volumn is 360 hours. All texts are manual transcribed with high accuray.

496 People – Indonesian Speech Data by Mobile Phone_Guiding

Indonesia speech data (guiding) is collected from 496 Indonesian native speakers and is recorded in quiet environment. The recording is rich in content, covering multiple categories such as in-car scene, smart home, speech assistant. 50 sentences for each speaker. The valid volumn is 10.5 hour. All texts are manual transcribed with high accuray.

639 Hours - Indonesian Speech Data by Mobile Phone

1285 Indonesian native speakers participated in the recording with authentic accent. The recorded script is designed by linguists and cover a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with mainstream Android and Apple system phones. The data set can be applied for automatic speech recognition, and machine translation scenes.

➤ Indonesian Conversational Speech Data

108 Hours - Indonesian Conversational Speech Data by Mobile Phone

The 108 Hours - Indonesian conversational speech data collected by phone involved 140 native speakers, developed with proper balance of gender ratio, Speakers would choose a few familiar topics out of the given list and start conversations to ensure dialogues' fluency and naturalness. The recording devices are various mobile phones. The audio format is 16kHz, 16bit, uncompressed WAV, and all the speech data was recorded in quiet indoor environments. All the speech audio was manually transcribed with text content, the start and end time of each effective sentence, and speaker identification.

89 Hours - Indonesian Conversational Speech Data by Telephone

The 89 Hours - Indonesian conversational speech data collected by Telephone involved 124 native speakers, developed with proper balance of gender ratio, Speakers would choose a few familiar topics out of the given list and start conversations to ensure dialogues' fluency and naturalness. The recording devices are various mobile phones. The audio format is 8kHz, 8bit, u-law pcm, and all the speech data was recorded in quiet indoor environments. All the speech audio was manually transcribed with text content, the start and end time of each effective sentence, and speaker identification.

The progress in the AI field cannot leave the credit of data. By improving the quality and diversity of datasets we can better unleash the potential of artificial intelligence, promote its applications of all walks of life. Only by continuously improving the data system, AI technology can better respond to the fast changing data requirements from market. If you have data requirements, please contact Nexdata.ai at [email protected].

Unleashing the Potential of Lip Language through Machine Learning

Recent

Join Nexdata MLC-SLM Workshop at Interspeech 2025

Exploring Datasets for iBeta Certification: A Guide for Biometric System Developers

The Crucial Role of Healthcare Chatbot Datasets in Advancing Medical Communication

Previous

Malay Speech Data

Next

AI-Driven Retail & e-Commerce