For the last 72 hours since ml-intern launched we have had over 500+ autonomous AI research projects running on the Space at all times. Some insane ones I saw: 1. A new AI paradigm from scratch — trying to replace transformers with a reasoning architecture based on energy minimization, binary sparse address tables and circular convolution binding. No GPU, no gradients, no training data — pure bitwise operations. Years of research done in 2 days. https://huggingface.co/Harry00/MLE-Morpho-Logic-Engine 2. Someone took LoopLM (ByteDance's recurrent depth transformer with shared layers and infinite depth via looping) and crossed it with BitNet b1.58 (ternary 1.58-bit weights). The result: a model that's both infinitely deep AND uses almost no memory per parameter. 3. Designing a new attention mechanism modeled on the thalamo-cortical circuit in the human brain. Pulling from 2025/2026 research out of MIT, Harvard, and UF. The thalamus gates what information reaches the cortex. They're building a learnable gate that mimics this for transformer attention heads, combined with EEG datasets and a reinforcement learning loop. https://huggingface.co/spaces/daniel8919/limbic-reasoning-agent The use cases people bring are cooler and more impressive than anything we imagined when we built this.
Introducing ml-intern, the agent that just automated the post-training team @huggingface It's an open-source implementation of the real research loop that our ML researchers do every day. You give it a prompt, it researches papers, goes through citations, implements ideas in GPU sandboxes, iterates and builds deeply research-backed models for any use case. All built on the Hugging Face ecosystem. It can pull off crazy things: We made it train the best model for scientific reasoning. It went through citations from the official benchmark paper. Found OpenScience and NemoTron-CrossThink, added 7 difficulty-filtered dataset variants from ARC/SciQ/MMLU, and ran 12 SFT runs on Qwen3-1.7B. This pushed the score 10% → 32% on GPQA in under 10h. Claude Code's best: 22.99%. In healthcare settings it inspected available datasets, concluded they were too low quality, and wrote a script to generate 1100 synthetic data points from scratch for emergencies, hedging, multilingual etc. Then upsampled 50x for training. Beat Codex on HealthBench by 60%. For competitive mathematics, it wrote a full GRPO script, launched training with A100 GPUs on http://hf.co/spaces, watched rewards claim and then collapse, and ran ablations until it succeeded. All fully backed by papers, autonomously. How it works? ml-intern makes full use of the HF ecosystem: - finds papers on arxiv and http://hf.co/papers, reads them fully, walks citation graphs, pulls datasets referenced in methodology sections and on http://hf.co/datasets - browses the Hub, reads recent docs, inspects datasets and reformats them before training so it doesn't waste GPU hours on bad data - launches training jobs on HF Jobs if no local GPUs are available, monitors runs, reads its own eval outputs, diagnoses failures, retrains ml-intern deeply embodies how researchers work and think. It knows how data should look like and what good models feel like. Releasing it today as a CLI and a web app you can use from your phone/desktop. CLI: https://github.com/huggingface/ml-intern/tree/main Web + mobile: https://huggingface.co/spaces/smolagents/ml-intern And the best part? We also provisioned 1k$ GPU resources and Anthropic credits for the quickest among you to use.