Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

Home > All Category Datasets > NLU Datasets > English-Japanese Parallel Corpus – 850,000 Sentence Pairs for Machine Translation

English-Japanese Parallel Corpus – 850,000 Sentence Pairs for Machine Translation

English Japanese parallel corpus

English Japanese translation dataset

English Japanese bilingual corpus

English Japanese parallel dataset

English Japanese text dataset

English Japanese MT dataset

This dataset contains 850,000 English-Japanese parallel sentences stored in TXT format. It covers multiple fields such as tourism, medical treatment, daily life, news, etc. average English sentence 23 words. The data desensitization and quality checking had been done. It can be used as a fundamental dataset for machine translation, bilingual NLP tasks, and other text processing applications.

This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.

Specifications

Storage format

TXT

Data content

English-Japanese Parallel Corpus Data

Data size

0.85 million pairs of English-Japanese Parallel Corpus Data. The English sentences contain 23 words on average.

Language

English, Japanese

Accuracy rate

90%

Application scenario

machine translation

Sample

Recommended Dataset

1,140,000 Groups - Chinese - Hebrew Parallel Corpus Data

1.14 Million Pairs of Sentences - Chinese-Hebrew Parallel Corpus Data be stored in text format. It covers multiple fields such as tourism, daily life, news, etc. The data desensitization and quality checking had been done. It can be used as a basic corpus for text data analysis in fields such as machine translation.

Chinese Hebrew Chinese-Hebrew Parallel Corpus

12,820,000 Groups - Chinese-Korean Parallel Corpus Data

12,820,000 sets of parallel translation corpus between China and Korea, which are stored in txt files. It covers many fields including spoken language, traveling, news, and finance. Data cleaning, desensitization, and quality inspection have been carried out. It can be used as the basic corpus database in the text data files as well as used in machine translation.

Chinese Korean Chinese-Korean Parallel Corpus

3,140,000 Groups - Chinese-Spanish Parallel Corpus Data

The 3,140,000 Groups - Chinese-Spanish Parallel Corpus Data is a bilingual texts is stored in text format. All of the data are related to science and technology. average sentence length is 37.1 characters. The data desensitization and quality checking had been done. It can be used as a basic corpus for text data analysis in fields such as machine translation.

Chinese Spanish Sino-Spain Parallel corpus

4,720,000 Groups - Chinese-Uighur Parallel Corpus Data

4,720,000 sets of Chinese and Uighur language parallel translation corpus, data storage format is txt document. Data cleaning, desensitization, and quality inspection have been carried out, which can be used as a basic corpus for text data analysis and in fields such as machine translation.

Chinese Uighur Han-Uyghur Parallel corpus

750,000 Groups - Chinese-Burmese Parallel Corpus Data

0.75 Million Pairs of Sentences - Chinese-Burmese Parallel Corpus Data be stored in text format. It covers multiple fields such as tourism, medical treatment, daily life, news, etc. The data desensitization and quality checking had been done. It can be used as a basic corpus for text data analysis in fields such as machine translation.

Chinese Burmese Sino-Myanmar Parallel corpus

7,290,000 Groups -Chinese -Vietnamese Parallel Corpus Data

7.29 Million Pairs of Sentences - Chinese-Vietnamese Parallel Corpus Data be stored in text format. It covers multiple fields such as tourism, medical treatment, daily life, news, etc. The data desensitization and quality checking had been done. It can be used as a basic corpus for text data analysis in fields such as machine translation.

Chinese Vietnamese Chinese-Vietnamese Parallel Corpus

10,030,000 Groups – Chinese-Portuguese Parallel Corpus Data

10.03 Million Pairs of Sentences - Chinese-Portuguese Parallel Corpus Data be stored in text format. It covers multiple fields such as tourism, medical treatment, daily life, news, etc. The data desensitization and quality checking had been done. It can be used as a basic corpus for text data analysis in fields such as machine translation.

Chinese Portuguese Chinese-Portuguese Parallel Corpus

5.3M Pairs German-Chinese Parallel Corpus for NLP and MT Applications

5.3 million Chinese-German parallel sentence pairs stored in text format, covering multiple domains such as tourism, medical treatment, daily life, news, etc. The data desensitization and quality checking had been done. It can be used for machine translation, NLP research, and bilingual text analysis.

german parallel corpus chinese german sentence pairs dataset chinese german bilingual corpus chinese german NLP corpus chinese german text alignment dataset

Tell Us Your Special Needs

Current Project Maturity

Early exploration (no concrete specs yet)

Defined goals, need professional guidance

Active development or optimization phase

Data & labeling experts with clear specifications

Full Name *

Contact Phone No.*

Company name *

Company Email *

Data Requirements *

By submitting, I agree to the Privacy Protection

Submit

Subscribe to our newsletter

Be the first to receive Nexdata latest product releases, data solutions and enterprise news.

Off-the-Shelf Datasets: All Category Datasets; Embodied AI Datasets; LLM Datasets; Computer Vision Datasets; Speech Recognition Datasets; Speech Synthesis Datasets; OCR Datasets; Pronunciation Dictionary; NLU Datasets

Data Service: 3D Point Cloud Data; Street View Data; OCR Data; Behavior Recognition Data; Identity Recognition Data; Speech Recognition Data; Speech Synthesis Data; Multimodal Data

Industries: Embodied AI; Generative AI; Autonomous Vehicles; AR/VR; Conversational AI; Smart Home; Retail; Intelligent Healthcare

Company: About Us; News; Partners; Quality & Security; Event
Links: OPENMPD; DataPlus; Datarade

Platform: Platform
Competition: Competition
Resources: Sponsored Datasets

Sharpen Your AI with Better Data

+1(626)594-5598

[email protected]

Sitemap Terms and Conditions

We use cookies to enhance your browsing experience, serve personalized ads or content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies.

b4ae2ef0-9c5a-46f6-a3c9-7c49fbd909a6

68e3169a-1bf3-4d1d-be95-fdaced10103c

English-Japanese Parallel Corpus – 850,000 Sentence Pairs for Machine Translation

English Japanese parallel corpus English Japanese translation dataset English Japanese bilingual corpus English Japanese parallel dataset English Japanese text dataset English Japanese MT dataset

Current Project Maturity

English Japanese parallel corpus

English Japanese translation dataset

English Japanese bilingual corpus

English Japanese parallel dataset

English Japanese text dataset

English Japanese MT dataset