Watching 12 sources. 3 new stories queued for analysis.
Currently processing sentiment from 847 social signals across tracked accounts.
“@garrytan Garry you gotta come on the pod and showcase how to install/use all of the stuff you've been working on @startupideaspod https://www.youtube.com/@GregIsenberg
“DESIGN.md is one of my favorite things that Stitch is doing! If you know, you know.
“OpenAI's new Euphony tool works almost exactly the same way as my Codex transcript viewer https://tools.simonwillison.net/codex-timeline?url=https%3A%2F%2Fgist.githubusercontent.com%2Fsimonw%2Fa9eb5993a2853ec840d26c0e56bde362%2Fraw%2Fb8c5febdf60d878da84e27c07efdaed159abde4a%2Flogs.jsonl#tz=local&q=&type=all&payload=all&role=all&hide=1&truncate=1
“I have found Euphony so useful internally! Glad it's now open source!
“Karpathy's autoresearch repo started an impressive trend. Agents can now train AI models to build SoTA agentic systems. And to think this is just scratching the surface. Ultimately, it boils down to good research questions or hypotheses. LLMs are not great at this (yet).
“Love this work from Aksel and the post-training team at Hugging Face! Turns out the HF ecosystem (papers, datasets, models all accessible through CLI, skills and md files) is perfect for running SOTA ML agents: agents that can train any type of AI model to top performance. A few concrete runs: ⭐️ Scientific reasoning: the agent walked citations from the benchmark paper, pulled OpenScience and NemoTron-CrossThink, added 7 difficulty-filtered variants from ARC/SciQ/MMLU, and ran 12 SFT ablations on Qwen3-1.7B. GPQA went from 10% to 32% in under 10 hours. Claude Code's best on the same prompt was 22.99%. ⭐️ HealthBench: it judged the existing datasets too noisy (!), generated 1100 synthetic examples covering emergencies, hedging and multilingual cases, upsampled 50x, and beat Codex by 60% (careful to check overfitting here) ⭐️ Competitive math: wrote a full GRPO script, launched A100s on HF Spaces, watched rewards climb and then collapse, and ran ablations until it found a recipe that held. And the harness is pretty tiny and simple. A couple of best practices and a handful of skills pointing at tools already in the ecosystem: arxiv and http://hf.co/papers for reading, the Hub for datasets and models, HF Jobs for compute, Trackio for metrics. Personal favorite is the "research skill" explaining how to do a SOTA landscape of a field (see https://github.com/huggingface/ml-intern/blob/main/agent/tools/research_tool.py) which is extremely powerful when combined with a simple prompt that basically tell "FIRST: Search HF ecosystem to find the best approach) (see https://github.com/huggingface/ml-intern/blob/main/agent/prompts/system_prompt.yaml#L14) On another note: setting good baselines on new benchmarks keeps getting harder when a setup this simple beats raw Codex by 60% on HealthBench out of the box. Give it a try if you're training AI models. We provisioned $1k of GPU resources and Anthropic credits for the quickest among you. Links: Github (CLI): https://github.com/huggingface/ml-internWeb Spaces (mobile): https://huggingface.co/spaces/smolagents/ml-intern
“OpenAI's first AI intern is expected by the end of this year, but we got impatient and decided to build it ourselves :) > Runs autonomously for hours / days depending on the task. > Can read every paper, model, and dataset on the HF Hub to build the best post-training recipes > Works with any capable model (Kimi K2.6, GPT-5.4, Opus 4.7 etc) > Runs locally or on HF infra (Spaces as sandboxes, Jobs for generating data & training models, Buckets for storage)
“I tested @huggingface ml-intern, given the prompt "Fine-tune a Segment Anything Model (SAM) on a useful medical dataset. Train the model, and provide a comprehensive tutorial in a Jupyter Notebook file. Additionally, create a Hugging Face article/blog post documenting everything you have done." It did it all autonomously: - Researched via hf_papers & searched GitHub/HF Hub - Found an HF dataset & wrote the finetuning script - Trained it using HF compute (took ~1 hour) - Pushed the weights & wrote the article Here are the model weights, code, and the blog it generated: hf article https://huggingface.co/Mayank022/blog-fine-tuning-sam-medical-segmentation model weights https://huggingface.co/Mayank022/sam-vit-base-kvasir-polyp-segmentation Awesome stuff @akseljoonas , looking forward to use this. 🔥
“I’ve been using ml-intern for a while, and it genuinely changed my workflow. It's super good at: - Model/Dataset discovery. - Post-Training setup iteration. - Data processing workflows. Huge shoutout to @akseljoonas for leading this!
“@Teknium 🙏
“it's insane that there are maybe five guys he missed that describe just myself Extraordinarily unproductive but high-volume conversation "everything is computer" wins by economic fiat though
““guy who is literally a solipsist but is still really invested in convincing strangers on the internet that he’s right”
“Tag yourself I'm "guy who rejects dualism because that would make mind uploading impossible and mean that he finally has to confront the inevitability of his own death"
“CrabTrap is a big deal for the OpenClaw community
“Brex just open sourced the key piece of infrastructure that enabled them to run their whole company on OpenClaw.
“talk talk talk
“we are so early
“There is still plenty of room at the bottom
“This is a great example of what I call a "cloud law" (and it's about actual clouds!). A "cloud law" is a regular, exploitable pattern in nature that's too complex to be either intuited or explained by an individual human being. AI is opening up a whole new type of science.
“Okay this is sick but how does it avoid butterfly effect issues?
“I worked on stuff like this in 2017 and it was clear then this was the future. It took a while for the hardware to catch up though!
“@emollick taste of what is now possible
“@emollick Haters are going to say it was Ottermaxed.
“ChatGPT images are not a toy; they're super useful in all kinds of professional settings too!