en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

39,993 Images – OCR Data of Internet Image

Multiple types of internet Images
OCR
Chinese
English
line-level rectangular bounding box annotation
column-level rectangular bounding box annotation
text transcription

39,993 Images – OCR Data of Internet Image. The collecting scenes of this dataset include subtitle, advertisement, cellphone screenshot, comic, emoticon, poster, magazine cover, etc. The language distribution is Chinese, English (a few). For annotation, line-level rectangular bounding box annotation and transcription for the texts were adopted for the internet images (column-level quadrilateral bounding box annotation and transcription for the texts were adopted for small amount of data). The dataset can be used for OCR tasks of internet images.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Data size
39,993 images, 227,910 bounding boxes
Collecting environment
including subtitle, advertisement, cellphone screenshot, comic, emoticon, poster, magazine cover etc.
Data diversity
including multiple types of internet images
Language distribution
Chinese, English (a few)
Data format
the image data format is .jpg, the annotation file format is .json
Annotation content
line-level rectangular bounding box annotation and transcription for the texts (column-level quadrilateral bounding box annotation and transcription for the texts were adopted for small amount of data)
Accuracy
the error bound of each vertex of a rectangular bounding box is within 5 pixels, which is a qualified annotation, the accuracy of bounding boxes is not less than 97%; the texts transcription accuracy is not less than 97%
Sample Sample
  • 39,993 Images – OCR Data of Internet Image
Recommended DatasetsRecommended Dataset
Tell Us Your Special Needs

By submitting, I agree to the Privacy Protection

bc0abf27-c596-44a3-87c4-ba37dd9072c9

0132c317-7e68-4431-a16e-af5d374e8c69