12d ago

Hugging Face launches Hugging Science hub for AI models and datasets

0

Hugging Face launched Hugging Science, a centralized hub aggregating open AI models and large scientific datasets for genomics, physics, chemistry, biology, materials science, and drug discovery. Contributions from NASA, Google, OpenAI, and Meta FAIR total over 100 TB, including 78 GB of genomics data, 11 TB of PDE simulations, and 100 million cell profiles. The platform simplifies access to previously dispersed resources for AI-driven research.

Original post

🤗🤗🤗introducing Hugging Science -- the home of AI for science 🤗🤗🤗 open models and datasets are the powerhouse of science (see the PDB), but finding the models and data you actually need for your breakthrough is hard af you shouldn't need to scrape arxiv, own your own wetlab, fight a custom HDF5 parser, build a fusion stellarator, and beg for compute before you've trained a single epoch so we're changing that we've put all the best science on @huggingface in one place: - 78GB of genomics data - 11TB of PDE simulations - 100M cell profiles - 9T DNA base pairs - 13M molecular trajectories - 400k medical QA pairs and much more, all open, and all ready for training (+ you can also now filter and search by domain, task, and keyword) we've put together all the biggest releases from our partners at NASA, Google, OpenAI, Meta FAIR, Arc Institute, Ginkgo, SandboxAQ, Proxima Fusion, NVIDIA, Ai2, OpenADMET, InstaDeep, Future House, Polymathic AI, LeMaterial, Earth Species Project, Merck, and Eve Bio if you're not sure where you fit in -- work on open challenges for problems that matter: including fusion stellarator design, ADMET, antibody developability, multilingual medicine, catalysis and materials, and scientific reasoning. we're already changing how science gets done: a fusion startup needed a benchmark for stellarator plasma confinement that didn't exist. @proximafusion shipped ConStellaration on Hugging Science: a leaderboard, dataset, and eval metrics, all in one place. a drug discovery team wanted to predict hPXR induction. OpenADMET put up a blind challenge: 11,000+ compounds assayed at Octant, 513 held out, two tracks (pEC50 + structure). Anyone in the world can train and submit. an antibody team at @Ginkgo released GDPa1, a developability dataset for stability, manufacturability, and immunogenicity prediction, with a live leaderboard scoring every submission. if you know a problem the ML community should be working on, let us know. make a challenge! this is about putting all the tools for solving science in one place. so we can hillclimb! → http://huggingscience.co

8:08 AM · Apr 29, 2026 View on X

AI 1000 · 7 actions