9d ago

Researchers open-source HALO optimizer for AI agent self-improvement

0

Researchers open-sourced HALO, a Hierarchical Agent Loop Optimizer using reinforcement learning from model feedback to analyze AI agent execution traces and generate recursive improvements. On AppWorld benchmark with Claude Sonnet 4.6, HALO boosted scenario goal completion from 73.7% to 89.5%, delivering a 15.8-point gain. The technique allows agents to iteratively self-improve by reviewing traces and applying fixes.

Original post

We’re introducing HALO 😇 Hierarchal Agent Loop Optimizer HALO is an RLM-based agent optimization technique capable of recursively self-improving agents by analyzing their execution traces and suggesting changes. This work is inspired by the Mismanaged Genius Hypothesis proposed by @a1zhang and @lateinteraction earlier this month. tldr; we improved performance on AppWorld (Sonnet 4.6) from 73.7 --> 89.5 (+15.8) by giving HALO-RLM access to harness trace data and asking it to identify issues. The feedback from HALO surfaced failures in the harness such as hallucinated tool calls, redundant arguments in tools, refusal loops, and semantic correctness issues. Each issue mapped cleanly to a direct prompt update. We then fed these finding into Cursor (Opus 4.6), and asked the coding agent to update the underlying harness. We repeated this trace -> HALO-RLM analysis -> code update loop until the score plateaued. Today we’re open-sourcing the core HALO-RLM framework, evals, and data for further review.

3:39 PM · Apr 29, 2026 View on X

AI 1000 · 2 actions