[{"@type":"PropertyValue","name":"Storage format","value":"TXT"},{"@type":"PropertyValue","name":"Data content","value":"Chinese-English Parallel Corpus Data"},{"@type":"PropertyValue","name":"Data size","value":"80.12 million pairs of Chinese-English Parallel Corpus Data."},{"@type":"PropertyValue","name":"Language","value":"Chinese, English"},{"@type":"PropertyValue","name":"Application scenario","value":"machine translation"}]
{"id":147,"datatype":"1","titleimg":"https://res.datatang.com/asset/productNew/APY170101223.png?Expires=2007353638&OSSAccessKeyId=LTAI5tQwXnJZbubgVfVa1ep9&Signature=uu%2BOjBbZoOeVHqwYc1zHqgdwXhE%3D","type1":"183","type1str":null,"type2":"183","type2str":null,"dataname":"80,120,000 Groups – Chinese-English Parallel Corpus Data","datazy":[{"title":"Storage format","value":"TXT"},{"title":"Data content","value":"Chinese-English Parallel Corpus Data"},{"title":"Data size","value":"80.12 million pairs of Chinese-English Parallel Corpus Data."},{"title":"Language","value":"Chinese, English"},{"title":"Application scenario","value":"machine translation"}],"datatag":"Chinese-English,Parallel Corpus","technologydoc":null,"downurl":null,"datainfo":"The 5.14 million sets of Chinese-English parallel corpora, covering tourism, medicine, daily scenario, TV drama and other fields, each set with 4-25 words, excluding political, pornography, personal privacy information and other sensitive words. As the basic corpus of text-based data analysis, it can be used in the field of machine translation.","standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":["3,062,170pairs","Chinese, English","4-25 words for each pair"],"samplePresentation":["jpg","https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY170101223_demo1709805600140/APY170101223-demo/zh-en%20%3F%3F%3F%3F.png?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=hV7rToYxKOSn4Bomdmt%2Bqtp2maY%3D","/data/apps/damp/temp/ziptemp/APY170101223_demo1709805600140/APY170101223-demo/zh-en ????.png",""],"officialSummary":"Parallel translation corpus between Chinese and English. It is stored in txt files. It covers files like travel, medicine, daily and TV play. Data cleaning, desensitization, and quality inspection have been carried out. It can be used as the basic corpus database in text data file as well as used in machine translation.","dataexampl":"","datakeyword":["Chinese-English Parallel Corpus Data"," Chinese-English Alignment"," Corpus"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Type","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"nlu","BGimg":"","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"],"single":"yes"}
80,120,000 Groups – Chinese-English Parallel Corpus Data
Chinese-English Parallel Corpus Data
Chinese-English Alignment
Corpus
Parallel translation corpus between Chinese and English. It is stored in txt files. It covers files like travel, medicine, daily and TV play. Data cleaning, desensitization, and quality inspection have been carried out. It can be used as the basic corpus database in text data file as well as used in machine translation.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Specifications
Storage format
TXT
Data content
Chinese-English Parallel Corpus Data
Data size
80.12 million pairs of Chinese-English Parallel Corpus Data.
Language
Chinese, English
Application scenario
machine translation
Sample
Recommended Dataset
1,140,000 Groups - Chinese - Hebrew Parallel Corpus Data
1.14 Million Pairs of Sentences - Chinese-Hebrew Parallel Corpus Data be stored in text format. It covers multiple fields such as tourism, daily life, news, etc. The data desensitization and quality checking had been done. It can be used as a basic corpus for text data analysis in fields such as machine translation.
Chinese -Hebrew Parallel Corpus Data Chinese -Hebrew Parallel Corpus Parallel Corpus Data Alignment Corpus Data
12,820,000 Groups - Chinese-Korean Parallel Corpus Data
12,820,000 sets of parallel translation corpus between China and Korea, which are stored in txt files. It covers many fields including spoken language, traveling, news, and finance. Data cleaning, desensitization, and quality inspection have been carried out. It can be used as the basic corpus database in the text data files as well as used in machine translation.
China and South Korea Parallel Corpus Corpus Data Alignment Corpus Parallel Corpus Data Alignment Corpus Data
3,140,000 Groups - Chinese-Spanish Parallel Corpus Data
The 3,140,000 Groups - Chinese-Spanish Parallel Corpus Data is a bilingual texts is stored in text format. All of the data are related to science and technology. average sentence length is 37.1 characters. The data desensitization and quality checking had been done. It can be used as a basic corpus for text data analysis in fields such as machine translation.
Chinese - Spanish Parallel Corpus Data Chinese -Spanish Parallel Corpus Parallel Corpus Data Alignment Corpus Parallel Corpus Data Alignment Corpus Data
850,000 Groups-English-Japanese Parallel Corpus Data
The 850,000 English Japanese Parallel Corpus Data is a bilingual text is stored in text format. It covers multiple fields such as tourism, medical treatment, daily life, news, etc. average English sentence 23 words. The data desensitization and quality checking had been done. It can be used as a basic corpus for text data analysis in fields such as machine translation.rn
English - Japanese Parallel Corpus Data English -Japanese Parallel Corpus Parallel Corpus Data Alignment Corpus Data
4,720,000 Groups - Chinese-Uighur Parallel Corpus Data
4,720,000 sets of Chinese and Uighur language parallel translation corpus, data storage format is txt document. Data cleaning, desensitization, and quality inspection have been carried out, which can be used as a basic corpus for text data analysis and in fields such as machine translation.
Chinese and Uygur Parallel Corpus Data Alignment Corpus Parallel Corpus Data Alignment Corpus Data
750,000 Groups - Chinese-Burmese Parallel Corpus Data
0.75 Million Pairs of Sentences - Chinese-Burmese Parallel Corpus Data be stored in text format. It covers multiple fields such as tourism, medical treatment, daily life, news, etc. The data desensitization and quality checking had been done. It can be used as a basic corpus for text data analysis in fields such as machine translation.
Chinese -Burmese Parallel Corpus Data Chinese -Burmese Parallel Corpus Parallel Corpus Data Alignment Corpus Data
7,290,000 Groups -Chinese -Vietnamese Parallel Corpus Data
7.29 Million Pairs of Sentences - Chinese-Vietnamese Parallel Corpus Data be stored in text format. It covers multiple fields such as tourism, medical treatment, daily life, news, etc. The data desensitization and quality checking had been done. It can be used as a basic corpus for text data analysis in fields such as machine translation.
Chinese -Vietnamese Parallel Corpus Data Chinese -Vietnamese Parallel Corpus Parallel Corpus Data Alignment Corpus Data
10,030,000 Groups – Chinese-Portuguese Parallel Corpus Data
10.03 Million Pairs of Sentences - Chinese-Portuguese Parallel Corpus Data be stored in text format. It covers multiple fields such as tourism, medical treatment, daily life, news, etc. The data desensitization and quality checking had been done. It can be used as a basic corpus for text data analysis in fields such as machine translation.
Chinese -Portuguese Parallel Corpus Data Chinese -Portuguese Parallel Corpus Parallel Corpus Data Alignment Corpus Data