[{"@type":"PropertyValue","name":"Data size","value":"500,000 images. For each language, there are 25,000 images in total, including 12,500 natural scene images and 12,500 document images"},{"@type":"PropertyValue","name":"Language distribution","value":"traditional Chinese, Japanese, Korean, Indonesian, Malay, Thai, Vietnamese, French, German, Italian, Portuguese, Russian, Spanish, Arabic, Turkish, Polish, Dutch, Greek, Czech, Filipino (Tagalog)"},{"@type":"PropertyValue","name":"Collecting environment","value":"Natural scene: including slogan, receipt, poster, warning sign, road sign, food packaging, billboard, station sign and signboard, etc. Document: electronic documents, meeting minutes, reports, manuals, user manuals, books, newspapers, teaching materials, etc."},{"@type":"PropertyValue","name":"Data diversity","value":"including a variety of natural scenes, multiple shooting angles"},{"@type":"PropertyValue","name":"Device","value":"cellphone, scanner"},{"@type":"PropertyValue","name":"Photographic angle","value":"looking up angle, looking down angle, eye-level angle"},{"@type":"PropertyValue","name":"Accuracy rate","value":"according to the collection requirements, the collection accuracy is not less than 97%"}]
{"id":1759,"datatype":"1","titleimg":"/shujutang/static/image/index/datatang_tuxiang_default.webp","type1":"147","type1str":null,"type2":"150","type2str":null,"dataname":"500,000 Images - Natural Scenes and Documents OCR Data","datazy":[{"title":"Data size","desc":"Data size","content":"500,000 images. For each language, there are 25,000 images in total, including 12,500 natural scene images and 12,500 document images"},{"desc":"Language distribution","content":"traditional Chinese, Japanese, Korean, Indonesian, Malay, Thai, Vietnamese, French, German, Italian, Portuguese, Russian, Spanish, Arabic, Turkish, Polish, Dutch, Greek, Czech, Filipino (Tagalog)","title":"Language distribution"},{"desc":"Collecting environment","content":"Natural scene: including slogan, receipt, poster, warning sign, road sign, food packaging, billboard, station sign and signboard, etc. Document: electronic documents, meeting minutes, reports, manuals, user manuals, books, newspapers, teaching materials, etc.","title":"Collecting environment"},{"desc":"Data diversity","content":"including a variety of natural scenes, multiple shooting angles","title":"Data diversity"},{"desc":"Device","content":"cellphone, scanner","title":"Device"},{"desc":"Photographic angle","content":"looking up angle, looking down angle, eye-level angle","title":"Photographic angle"},{"desc":"Accuracy rate","content":"according to the collection requirements, the collection accuracy is not less than 97%","title":"Accuracy rate"}],"datatag":"Natural scenes, Documents, OCR","technologydoc":null,"downurl":null,"datainfo":null,"standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":"","samplePresentation":[{"name":"Italian.jpg","url":"https://storage-product.datatang.com/damp/product/instructions_zh/20250408181303/Italian.jpg?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=4PPPhAekKh9bLbFrSmO6wAeEHmw%3D","intro":"","size":1809941,"progress":100,"type":"jpg"},{"name":"German.JPG","url":"https://storage-product.datatang.com/damp/product/instructions_zh/20250408181303/German.JPG?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=jtiszx3%2FxhwmY8Lf11kG1CS7kYA%3D","intro":"","size":4991935,"progress":100,"type":"jpg"},{"name":"German-1.jpg","url":"https://storage-product.datatang.com/damp/product/instructions_zh/20250408181303/German-1.jpg?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=bAy%2FbH26hDUdacLQwMwnMTvj390%3D","intro":"","size":963208,"progress":100,"type":"jpg"}],"officialSummary":"The dataset consists of 500,000 images for multi-country natural scenes and document OCR, including 20 languages such as Traditional Chinese, Japanese, Korean, Indonesian, Malay, Thai, Vietnamese, Polish, etc. The diversity includes various natural scenarios and multiple shooting angles. This set of data can be used for multi-language OCR tasks.","dataexampl":null,"datakeyword":["Natural scenes"," Documents"," OCR"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Data Type,Language","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"ocr","BGimg":"","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"],"firstList":[{"name":"Korean.jpg","url":"https://storage-product.datatang.com/damp/product/instructions_zh/20250408181303/Korean.jpg?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=rMkhxOhurCs7nUTnO5puCPLBo6s%3D","intro":"","size":1218381,"progress":100,"type":"jpg"}]}
500,000 Images - Natural Scenes and Documents OCR Data
Natural scenes
Documents
OCR
The dataset consists of 500,000 images for multi-country natural scenes and document OCR, including 20 languages such as Traditional Chinese, Japanese, Korean, Indonesian, Malay, Thai, Vietnamese, Polish, etc. The diversity includes various natural scenarios and multiple shooting angles. This set of data can be used for multi-language OCR tasks.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Specifications
Data size
500,000 images. For each language, there are 25,000 images in total, including 12,500 natural scene images and 12,500 document images