Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

Open Datasets for Academic Research

Nexdata has launched the AI Data-assisted Research Program for non-commercial organizations worldwide, including universities and academic institutions. This program provides valuable training datasets in computer vision, speech recognition, and other fields to support AI academic research.

Computer vision Computer vision
Speech Recognition Speech Recognition
Dataset Name Data Type Data Size Capture Content
1,000 Images Caption Data of Diverse Scenes Image 1000 images Image caption dataset of diverse scenes. The scene distribution includes natural scenery, urban street, exhibitions, home environment, etc. Each image includes an 3-5 sentences English description.
1,000 Images Caption Data of OCR in Natural Scenes Image 1000 images OCR caption dataset of 14 languages. The subjects of images include bus stops, posters, road signs, etc. Each image includes an 3-5 sentences English description.
1,000 Images Caption Data of Human Face Image 1000 images Human face image caption dataset of various head postures, facial expressions, etc. Each image includes an 3-5 sentences English description.
1,000 Images Caption Data of Gestures Image 1000 images Gesture image caption dataset of different angles and gestures categories .Each image includes an 3-5 sentences English description.
1,000 Images Human Facial Skin Defects Data Image 1000 images Facial skin defect dataset, including acne, acne scars, dark spots, wrinkles and dark circles.
1,000 Videos Caption Data of Human Motion Video 1000 videos Human motion video caption dataset in CCTV and non CCTV scenes. Human motions include walking, drinking, yawning, fitness, etc. Each video inlcudes an English captions.
1,000 People Multi-race 7 Expressions Recognition Data Image 1000 people 7 facial expressions dataset, including normal, happy, amazed, sad, angry, disgusted, scared.
1,000 Videos Multi-race Micro-expression (FACS) Data Video 1000 videos 57 facial micro-expression dataset,including inner brow raiser(AU1), outer brow raiser(AU2), upper lid raiser(AU5), etc.
50 People- DMS Data Video 50 people DMS dataset of dangerous behavior, fatigue behavior and visual movement behavior. The dataset diversity includes various subject age periods, time periods, vehicle types and camera positions.
50 People-2D Face Anti-Spoofing Data Image&Video 50 people 2D face anti-spoofing dataset. Real face data includes facial action videos, facial images and lip language videos. Anti-spoofing data includes fake facial action videos, fake lip language videos and fake facial images.
1,000 Images Gesture Recognition Data Image 1000 images Gesture recognition dataset of 18 gesture categories. The gestures categories include number 1, OK, LOVE, etc. For dataset annotation, 21 landmarks of hand and multiple gesture labels were adopted.
3,000 Images Natural Scene OCR Data Image 3000 images Natural scene OCR dataset of Asian languages(Japanese, Korean, etc.) and European languages(French, German, etc.). For annotation, line-level quadrilateral bounding box annotation and transcription for the texts were adopted.
500 Images Handwriting OCR Data Image 500 images Handwriting OCR data of English and Japanese. For annotation, line-level quadrilateral bounding box annotation and transcription for the texts were adopted.
50 People- 3D Face Anti-Spoofing Data Image 50 people 3D face anti-spoofing dataset. Real face data includes facial images. Anti-spoofing data includes fake facial images. Each image corresponds to a depth image, a depth values file and a camera parameters file.
1,000 People Multi-race and Multi-pose Face Images Data Image 1000 people Facial recognition dataset of multiple races. Each subject has 29 facial images, including 14 indoor multi-pose images, 14 outdoor multi-pose images and 1 id image. The annotations include labels of race, gender, age, and facial pose.
Dataset Name Recording Device Data Size Specifications
2 Hours- 4 Countries English Speech Synthesis Corpus Microphone 2 hours, 4 people People: 4 people from America, British, Australia, New Zealand
Format : 48,000Hz, 24bit, uncompressed wav, mono channel;
Recording environment : professional recording studio
20 Hours - France French Reading & Conversational Speech Data by Mobile Phone Mobile Phone 20 hours Format : 16kHz, 16bit, uncompressed wav, mono channel;
Recording condition : Low background noise(indoor), without echo;
Content category : Reading, Conversation
Recording device : Android Smartphone, iPhone;
Country : Portugal
Language : Portuguese;
Features of annotation : Transcription text;
Accuracy Rate : Word Accuracy Rate (WAR) is at least 97%
20 Hours - German Reading & Conversational Speech Data by Mobile Phone Mobile Phone 20 hours Format : 16kHz, 16bit, uncompressed wav, mono channel;
Recording condition : Low background noise(indoor), without echo;
Content category : Reading, Conversation
Recording device : Android Smartphone, iPhone;
Country : Germany
Language : German;
Features of annotation : Transcription text;
Accuracy Rate : Word Accuracy Rate (WAR) is at least 97%
20 Hours - Italian Reading & Conversational Speech Data by Mobile Phone Mobile Phone 20 hours Format : 16kHz, 16bit, uncompressed wav, mono channel;
Recording condition : Low background noise(indoor), without echo;
Content category : Reading, Conversation
Recording device : Android Smartphone, iPhone;
Country : Italy
Language : Italian;
Features of annotation : Transcription text;
Accuracy Rate : Word Accuracy Rate (WAR) is at least 97%
20 Hours - Spain Spanish Reading & Conversational Speech Data by Mobile Phone Mobile Phone 20 hours Format : 16kHz, 16bit, uncompressed wav, mono channel;
Recording condition : Low background noise(indoor), without echo;
Content category : Reading, Conversation
Recording device : Android Smartphone, iPhone;
Country : Spain
Language : Spanish;
Features of annotation : Transcription text;
Accuracy Rate : Word Accuracy Rate (WAR) is at least 97%
20 Hours - European Portuguese Reading & Conversational Speech Data by Mobile Phone Mobile Phone 20 hours Format : 16kHz, 16bit, uncompressed wav, mono channel;
Recording condition : Low background noise(indoor), without echo;
Content category : Reading, Conversation
Recording device : Android Smartphone, iPhone;
Country : Portugal
Language : Portuguese;
Features of annotation : Transcription text;
Accuracy Rate : Word Accuracy Rate (WAR) is at least 97%
20 Hours - Japanese Reading & Conversational Speech Data by Mobile Phone Mobile Phone 20 hours Format : 16kHz, 16bit, uncompressed wav, mono channel;
Recording condition : Low background noise(indoor), without echo;
Content category : Reading, Conversation
Recording device : Android Smartphone, iPhone;
Country : Japan
Language : Japanese;
Features of annotation : Transcription text;
Accuracy Rate : Word Accuracy Rate (WAR) is at least 97%
20 Hours - Korean Reading & Conversational Speech Data by Mobile Phone Mobile Phone 20 hours Format : 16kHz, 16bit, uncompressed wav, mono channel;
Recording condition : Low background noise(indoor), without echo;
Content category : Reading, Conversation
Recording device : Android Smartphone, iPhone;
Country : Korea
Language : Korean;
Features of annotation : Transcription text;
Accuracy Rate : Word Accuracy Rate (WAR) is at least 97%
10 Hours - Pashto Conversational Speech Data by Telephone Telephone 10 hours Format : 8kHz 8bit, a-law/u-law pcm, mono channel
Content category : Dialogue based on given topics
Recording condition : Low background noise (indoor)
Recording device : Telephony
Country : Afghanistan(AFG)
Language(Region) Code : ps-AF
Language : Pashto
Speaker : 224 people in total, 92% male and 8% female
Features of annotation : Transcription text, timestamp, speaker ID, gender
Accuracy rate : Word accuracy rate(WAR) 95%
Accuracy Rate : Word Accuracy Rate (WAR) is at least 95%
Interspeech_ Accented English Speech Recognition Competition Data Mobile Phone 200 hours,528people /
Note: Please apply for datasets reasonably according to the research field. The maximum number of applications for Computer Vision datasets is 6 sets.
Note: Please apply for datasets reasonably according to the research field. The maximum number of applications for speech recognition datasets is 4 sets.

Application Process and Instruction

Select sponsored dataset

Select sponsored dataset

Submit the form

Submit the form

Wait for feedback

Wait for feedback

Receive dataset

Receive dataset

Apply for Sponsored Dataset

By submitting, I agree to the Data License Agreement

Cooperation Institution

Nexdata reserves the right to interpret the opensource data activities.

39430149-c3f5-4b81-881a-d52e0cc7aa26

2c8d1d58-90c3-4042-8b59-8d094350ba57