en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

LLM Datasets

Instantly enhance AI model performance with high quality off-the-shelf datasets.

Type

All
23
Image Caption
11
SFT Datasets
5
Pre-training Text
7

100,000 Sets of ICONS Image Caption Data

100,000 Sets of ICONS Image Caption Data. The data includes two major categories of icons, namely 3D Style Icons and Vector Illustration Icons, totaling 17 subcategories. In terms of annotation, the icon descriptions are in Chinese, with a description length of about 30 characters. The data can be used for tasks such as graphic recognition and interface interaction.
ICONS Image caption

6.9 million - Chinese Multi-disciplinary Questions Text Parsing And Processing Data

6.9 million - Chinese Multi-disciplinary Questions Text Parsing And Processing Data, including multiple disciplines in primary school, middle school, high school and university. Each questions contain title, answer, parse, type, subject, grade. The dataset can be used for large model subject knowledge enhancement tasks.
Chinese multi-disciplinary Questions LLM Text

1 million - Chinese Code Questions Text Parsing And Processing Data

1 million - Chinese Code Questions Text Parsing And Processing Data, including c, c++, python, java, javascript multiple language code questions. Each test contains title, answer, parse, and language fields. This data can help model building and solidify code programming skills for better performance in programming tasks.
Code Questions LLM Text

32 million - Science Subjects Questions Text Parsing And Processing Data

32 million - Science Subjects Questions Text Parsing And Processing Data, including mathematics, physics, chemistry and biology in primary, middle and high school and university. Each questions contain title, answer, parse, type, subject, grade. The dataset can be used for large model subject knowledge enhancement tasks.
Science Subjects Questions LLM Text

140,000 - Contest Questions Text Parsing And Processing Data

140,000 - Contest Questions Text Parsing And Processing Data, including mathematics, physics, chemistry and biology in primary, middle school, high school and university. Each questions contain title, answer, parse, subject, grade, question type. The dataset can be used for large model subject knowledge enhancement tasks, while contributing to the overall intelligence development of the model.
Contest Questions LLM Text

130 Million - Chinese Test Question Texts from Elementary School to University Parsing And Processing Data

130 million Chinese question text data from primary school to college, 20.87 million K12 question data (including 16 million parsing questions), 117 million college and vocational question data (including 7 million parsing questions)
Professional questions Text LLM

10 million - English Test Questions Text Parsing And Processing Data

10 Million - English Test Questions Text Parsing And Processing Data, Each question contains title, answer, parse, subject, grade, question type; The educational stages cover primary, middle, high school, and university; Subjects cover mathmatics, biology, accounting, etc.The data are questions text under the Anglo-American system, which can be used to enhance the subject knowledge of large models
English test questions text data LLM Large Language Model Large Model chatgpt data

31 million Southeast Asian language news text dataset

This dataset is multilingual news data from Southeast Asia, covering four languages: Indonesian, Malay, Thai, and Vietnamese. The total amount of data exceeds 31 million, stored in JSONL format, with each record running independently in a row for efficient reading and processing. The data sources are extensive, covering various news topics, and can comprehensively reflect the social dynamics, cultural hotspots, and economic trends in Southeast Asia. This dataset can help large models improve their multilingual capabilities, enrich cultural knowledge, optimize performance, expand industry applications in Southeast Asia, and promote cross linguistic research.
Minor languages Southeast Asia NEWS Journalism

21,998Image Caption Data of Vehicles

21998 Image Caption Data Of Vehicles covers various types of cars, SUVs, MPVs, trucks, and buses. Surveillance cameras are used to collect outdoor roads for multiple periods of time, mainly describing the types of vehicles. Information such as color, vehicle orientation, scene, etc., the description language is English.
multi-modality vehicle attribute data security data intelligent monitoring data intelligent traffic data smart city data

loading

Tailor Your Data Now

Why off-the-shelf Datasets

  • Copyright

    Copyright

    Clear Coyright and Ready to Check
  • Security

    Security

    Properly Authorized Secure to Use
  • Professional

    Professional

    Designed and produced by AI data experts
  • Diversity

    Diversity

    Collected from a varity of real scenes
  • Cost Effective

    Cost Effective

    More Cost-Efficient Than Tailored Data
  • Efficiency

    Efficiency

    Ready-To-Go Deliver in Seconds
4bc1c227-4a7a-47fc-b9da-a8c51d395720