Hugging Face reaches one million public datasets on Hub
Hugging Face reached one million public datasets on its Hub, totaling petabytes of data used daily by millions of users to train AI models. Growth advanced from 10,000 datasets in October 2022 to 100,000 in February 2024 and 500,000 in September 2025, then doubled within eight months. Improvements in AI agents accelerated dataset creation and reuse. Company charts tracked cumulative growth across categories including computer vision, natural language processing, and multimodal tasks.
We just crossed 1,000,000 public datasets on Hugging Face! That's petabytes of data available that millions of AI builders are downloading, analyzing, and training AI models on every day!
What's interesting is that we see a clear acceleration since agents started to be good as the number of datasets doubled over the past 8 months (it took 4 years to reach the first 500k). It's becoming easier and faster to build, share and use your own datasets!
Many are saying the next bottleneck for more people to build AI themselves (instead of relying on APIs) is better data so we're just getting started! Thanks everyone for your amazing contributions, we couldn't do it without you!
