How to Build a Voiceprint System? In View of Datasets

From：Nexdata Date： 2024-08-15

➤ China intelligent customer service

In the field of machine learning and deep learning, datasets plays an irreplaceable role. No matter it is image data for convolutional neural networks or massive text data for natural language processing, the integrity and diversity of data directly determine the learning results of a model. With the advancement of technology, datasets that collected from specific scenarios have becomes the core strategy for improving model performance.

➤ Speech synthesis and speech data

According to the “2021 China Intelligent Customer Service Market Report”, the market of China’s intelligent customer service industry will reach 3.01 billion Yuan in 2020, a year-on-year increase of 88.1%. It is expected that the intelligent customer service market may exceed 10 billion Yuan by 2025, showing a rapid growth trend.

Using NLP, ASR and other technologies, intelligent customer service can greatly improve text and language processing capabilities. It has outstanding advantages in access channels, response efficiency, data management and analysis, and improves work efficiency.

However, the problems of customers are all kinds of strange, and intelligent customer service robots are often helpless in the face of complex problems raised by customers. Both AI and ML applications are machines with their own limitations. They can only process based on the data in the system. When any query or communication exceeds their limited data, these tools can get stuck or give false and irrelevant answers. In addition, most intelligent robots on the market cannot read between the lines or fully understand the meaning of the context. There is still a long way to go for intelligent customer service to replace human customer service.

As a world’s leading AI data services provider, Nexdata has been committed to providing high-quality customer service data solutions to help empower the industry with technology and achieve the implementation of technology in more application scenarios.

20 Hours — American English Speech Synthesis Corpus-Male

Male audio data of American English. It is recorded by American English native speakers, with authentic accent. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.

19.46 Hours — American English Speech Synthesis Corpus-Female

Female audio data of American English,. It is recorded by American English native speaker, with authentic accent and sweet sound. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.

10.4 Hours — Japanese Synthesis Corpus-Female

➤ 317 Hours Cantonese Speech Data

10.4 Hours — Japanese Synthesis Corpus-Female. It is recorded by Japanese native speaker, with authentic accent. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.

50 People — Chinese-English Mixed Average Tone Speech Synthesis Corpus-Customer Service

50 People — Chinese-English Mixed Average Tone Speech Synthesis Corpus-Customer Service. It is recorded by Chinese native speakers,customer service text, and the syllables, phonemes and tones are balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.

2,520 Hours — Real-time Speech Assistant Mandarin Speech Data

This data is the customer consultation data of a well-known voice assistant in the real scene, and it is the actual consultation recording between customer service and customers. The valid time is 2,520 hours. The collection is carried out in queit indoor environment, including some noises that don’t affect the speech recognition. All texts are of high accuracy after manually transcribed and proofread by professional annotators.

800 Hours — English Real-time Speech Data of Typical-fields Customer Service

800 Hours — English Real-time Speech Data of Typical-fields Customer Service, collected from real scenes, recording real interactions between customer service staff and customers; it comes from customer service centers, and covers multiple fields. Text content, speaker’s identity and gender, sensitive information and other attributes are annotated.

317 Hours — Cantonese Real-time Speech Data of Real estate Customer Service

Cantonese customer service speech data with a duration of 317 hours, collected from real scenes, recording real interactions between customer service staff and customers; it comes from customer service centers. Text content, speaker’s identity and gender, sensitive information, and other attributes are annotated.

End

If you need data services, please feel free to contact us: info@nexdata.ai.

High-quality datasets are the cornerstone of the development of artificial intelligence technology. Whether it is current application or future development, the importance of datasets is unneglectable. With the in-depth application of AI in all walks of life, we have reason to believe by constant improving datasets, future intelligent system will become more efficient, smart and secure.

How to Build a Voiceprint System? In View of Datasets

Recent

Join Nexdata MLC-SLM Workshop at Interspeech 2025

Exploring Datasets for iBeta Certification: A Guide for Biometric System Developers

The Crucial Role of Healthcare Chatbot Datasets in Advancing Medical Communication

Previous

Pytouch Now is finicially independent from Meta

Next

Nexdata’s Children Speech Data Helps Build the Best Voice Assistant for Kids