TWThomas Wolf@Thom_Wolf
#62 AIFounder
“Love this work from Aksel and the post-training team at Hugging Face!
Turns out the HF ecosystem (papers, datasets, models all accessible through CLI, skills and md files) is perfect for running SOTA ML agents: agents that can train any type of AI model to top performance.
A few concrete runs:
⭐️ Scientific reasoning: the agent walked citations from the benchmark paper, pulled OpenScience and NemoTron-CrossThink, added 7 difficulty-filtered variants from ARC/SciQ/MMLU, and ran 12 SFT ablations on Qwen3-1.7B. GPQA went from 10% to 32% in under 10 hours. Claude Code's best on the same prompt was 22.99%.
⭐️ HealthBench: it judged the existing datasets too noisy (!), generated 1100 synthetic examples covering emergencies, hedging and multilingual cases, upsampled 50x, and beat Codex by 60% (careful to check overfitting here)
⭐️ Competitive math: wrote a full GRPO script, launched A100s on HF Spaces, watched rewards climb and then collapse, and ran ablations until it found a recipe that held.
And the harness is pretty tiny and simple. A couple of best practices and a handful of skills pointing at tools already in the ecosystem: arxiv and http://hf.co/papers for reading, the Hub for datasets and models, HF Jobs for compute, Trackio for metrics.
Personal favorite is the "research skill" explaining how to do a SOTA landscape of a field (see https://github.com/huggingface/ml-intern/blob/main/agent/tools/research_tool.py) which is extremely powerful when combined with a simple prompt that basically tell "FIRST: Search HF ecosystem to find the best approach) (see https://github.com/huggingface/ml-intern/blob/main/agent/prompts/system_prompt.yaml#L14)
On another note: setting good baselines on new benchmarks keeps getting harder when a setup this simple beats raw Codex by 60% on HealthBench out of the box.
Give it a try if you're training AI models. We provisioned $1k of GPU resources and Anthropic credits for the quickest among you.
Links:
Github (CLI): https://github.com/huggingface/ml-internWeb
Spaces (mobile): https://huggingface.co/spaces/smolagents/ml-intern