Rank #284h ago

Researchers present KL-regularized policy gradient paper at ICLR 2026

QYY

3 top authors

Researchers presented the paper "On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning" at ICLR 2026 in Pavilion 4, booth #4517. The work introduces the Regularized Policy Gradient (RPG) framework and examines KL-regularized policy gradient methods to improve large language model reasoning. The research influenced LLM models V4 and V3.2. Quanquan Gu, Associate Professor of Computer Science at UCLA leading the AGI Lab and Pre-training & Scaling Co-Lead at ByteDance Seed, co-authored the paper.

First post

YIFENG LIU@YIFENGLIU_AI·4hOriginal post

Excited to present our paper, "On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning," at ICLR 2026 (Pavilion 4 #4517) this afternoon! Project page: https://github.com/complex-reasoning/RPG Paper: https://arxiv.org/abs/2505.17508

View on

Why it matters

Yifeng Liu (UCLA AGI Lab PhD student) applied lessons from his prior Kimi-1.5 reinforcement learning project to the Regularized Policy Gradient framework.

Yifan Zhang (Princeton AI Lab Fellow) unified normalized and unnormalized KL regularization variants within the new policy gradient algorithms.

Quanquan Gu (UCLA AGI Lab leader) coauthored work introducing RPG-Style Clip for stable off-policy policy gradient training at scale.

3 more posts

Retweeted by Quanquan Gu·2hView on
Retweeted by Quanquan Gu·2hView on