NLU Datasets – Intent & Parallel Corpus

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

Home > All Category Datasets > NLU Datasets

Type

All

Intention Understanding

Parallel Corpus

5.3M Pairs German-Chinese Parallel Corpus for NLP and MT Applications

5.3 million Chinese-German parallel sentence pairs stored in text format, covering multiple domains such as tourism, medical treatment, daily life, news, etc. The data desensitization and quality checking had been done. It can be used for machine translation, NLP research, and bilingual text analysis.

german parallel corpus chinese german sentence pairs dataset chinese german bilingual corpus chinese german NLP corpus chinese german text alignment dataset

English Intent Recognition Dataset – 84,516 Sentences with Slot Filling Annotations

This dataset contains 84,516 English sentences annotated with intent classes, slot labels, and slot values. The intent field includes music, weather, date, schedule, and smart home equipment, etc. It is applied to intent recognition, intent classification, and slot filling tasks.

intent recognition dataset intent classification dataset natural language understanding dataset NLU dataset English annotated intent dataset intent detection dataset dialogue intent dataset chatbot training dataset slot filling dataset

1,080,000 Groups – English-Russian Parallel Corpus Data

English and Russian parallel corpus, 1,080,000 groups in total; excluded political, porn, personal information and other sensitive vocabulary; it can be a base corpus for text-based data analysis, used in machine translation and other fields.

English-Russian parallel corpus

1,340,000 Groups – English-Korean Parallel Corpus Data

English and Korean parallel corpus, 1340,000 groups in total; excluded political, porn, personal information and other sensitive vocabulary; it can be a base corpus for text-based data analysis, used in machine translation and other fields.

English-Korean parallel corpus

Japanese-English Parallel Corpus Dataset – 380,000 Sentence Pairs for Machine Translation

This dataset contains 380,000 aligned Japanese-English sentence pairs. Sensitive content such as political, pornographic, and personal information has been excluded. It can be a base corpus for translation systems, cross-lingual information retrieval, and text-based language processing applications.

japanese english parallel corpus japanese english translation dataset bilingual corpus japanese english japanese english text dataset japanese english aligned corpus japanese english NLP dataset

Intent Classification Dataset – 47,811 Annotated English Sentences for Dialogue Systems

This dataset contains 47,811 English sentences annotated with intent classes, slot labels, and slot values. The intent field includes music, weather, date, schedule, smart home equipment, etc. it is applied to intent recognition, intent classification, and slot filling tasks.

slot filling dataset intent detection dataset intent recognition dataset intent classification dataset

Chinese-English Parallel Corpus Dataset (80,120,000 Sentence Pairs) – Translation & NLP

This dataset contains 80 million Chinese-English parallel sentences, covering domains such as travel, medicine, daily conversation, and TV scripts. It is stored in txt format, cleaned, desensitized, and quality-checked. It can be used as a fundamental dataset for machine translation, bilingual NLP tasks, and other text processing applications.

Chinese English parallel corpus Chinese English translation dataset Chinese English machine translation data Chinese English bilingual corpus Chinese English parallel dataset Chinese English text dataset

1,990,000 Groups - Chinese-Czech Parallel Corpus Data

1,990,000 sets of Chinese and Czech language parallel translation corpus, data storage format is txt document. Data cleaning, desensitization, and quality inspection have been carried out, which can be used as a basic corpus for text data analysis and in fields such as machine translation.

Chinese Czech Parallel

English-Japanese Parallel Corpus – 850,000 Sentence Pairs for Machine Translation

This dataset contains 850,000 English-Japanese parallel sentences stored in TXT format. It covers multiple fields such as tourism, medical treatment, daily life, news, etc. average English sentence 23 words. The data desensitization and quality checking had been done. It can be used as a fundamental dataset for machine translation, bilingual NLP tasks, and other text processing applications.

English Japanese parallel corpus English Japanese translation dataset English Japanese bilingual corpus English Japanese parallel dataset English Japanese text dataset English Japanese MT dataset

Tailor Your Data Now

Why off-the-shelf Datasets

Copyright
Clear Coyright and Ready to Check
Security
Properly Authorized Secure to Use
Professional
Designed and produced by AI data experts
Diversity
Collected from a varity of real scenes
Cost Effective
More Cost-Efficient Than Tailored Data
Efficiency
Ready-To-Go Deliver in Seconds

Subscribe to our newsletter

Be the first to receive Nexdata latest product releases, data solutions and enterprise news.

Off-the-Shelf Datasets: All Category Datasets; Embodied AI Datasets; LLM Datasets; Computer Vision Datasets; Speech Recognition Datasets; Speech Synthesis Datasets; OCR Datasets; Pronunciation Dictionary; NLU Datasets

Data Service: 3D Point Cloud Data; Street View Data; OCR Data; Behavior Recognition Data; Identity Recognition Data; Speech Recognition Data; Speech Synthesis Data; Multimodal Data

Industries: Embodied AI; Generative AI; Autonomous Vehicles; AR/VR; Conversational AI; Smart Home; Retail; Intelligent Healthcare

Company: About Us; News; Partners; Quality & Security; Event
Links: OPENMPD; DataPlus; Datarade

Platform: Platform
Competition: Competition
Resources: Sponsored Datasets

Sharpen Your AI with Better Data

+1(626)594-5598

[email protected]

Sitemap Terms and Conditions

We use cookies to enhance your browsing experience, serve personalized ads or content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies.

9865b8d1-4d69-48ea-a66b-8ecbeb310eb3