1d ago

PrimeIntellect introduces Renderers boosting RL throughput over 3x

0

PrimeIntellect introduced Renderers to resolve mismatches in reinforcement learning pipelines where trainers operate on tokens while environments generate messages. The system enforces explicit token-in and token-out handling with user-controlled templating. LMSYS Org collaborated on the effort. The change eliminates hidden chat-template rewrites and delivers more than 3x throughput gains on popular open models.

Original post

Introducing Renderers RL trainers work in tokens. Environments work in messages. Going back and forth corrupts sampled tokens, wasting compute on every agentic turn. With Renderers, we fix this mismatch. This unlocks >3x throughput on popular open models.

4:44 PM · May 12, 2026 View on X
Reposted by

The jinja chat template has always felt like a temporary equilibrium, so we've needed someone to take the reigns and try to build that out within the community.

Excited about this!

Prime Intellect@PrimeIntellect

Introducing Renderers RL trainers work in tokens. Environments work in messages. Going back and forth corrupts sampled tokens, wasting compute on every agentic turn. With Renderers, we fix this mismatch. This unlocks >3x throughput on popular open models.

11:44 PM · May 12, 2026 · 169K Views
11:59 PM · May 12, 2026 · 14K Views

Harmony was the first attempt at this imo, but it never broke out of the OpenAI model ecosystem. I'm honestly not sure why, but would guess lack of community effort https://github.com/openai/harmony

Nathan Lambert@natolambert

The jinja chat template has always felt like a temporary equilibrium, so we've needed someone to take the reigns and try to build that out within the community. Excited about this!

11:59 PM · May 12, 2026 · 14K Views
12:01 AM · May 13, 2026 · 2.5K Views

@willccbb @vllm_project @sgl_project @huggingface @tinkerapi confirmado

will brown@willccbb

all chat templates are wrong, some chat templates are useful we found some CRAZY performance wins by patching official templates, and we're shipping them in a standalone library you can use with any RL stack w/ examples for @vllm_project @sgl_project @huggingface @tinkerapi

11:50 PM · May 12, 2026 · 40.1K Views
12:00 AM · May 13, 2026 · 1.3K Views

@willccbb @vllm_project @sgl_project @huggingface @tinkerapi src https://rlhfbook.com/teach/course/lec2-chap4-5-9/#14

Nathan Lambert@natolambert

@willccbb @vllm_project @sgl_project @huggingface @tinkerapi confirmado

12:00 AM · May 13, 2026 · 1.3K Views
12:00 AM · May 13, 2026 · 514 Views

A gift from the Gods. Dealing with multiple models and many envs in the same RL codebase while respecting correctness constraints (no train / inference tokenization mismatch) is becoming a huge pain.

I have a vibe-coded draft PR that does exactly this, but happy I won’t have to land or maintain it now. Let’s hope the field can really standardize on one abstraction.

Prime Intellect@PrimeIntellect

Introducing Renderers RL trainers work in tokens. Environments work in messages. Going back and forth corrupts sampled tokens, wasting compute on every agentic turn. With Renderers, we fix this mismatch. This unlocks >3x throughput on popular open models.

11:44 PM · May 12, 2026 · 169K Views
9:47 AM · May 13, 2026 · 14.6K Views

@TacoCohen Very cool! I think tinker from @thinkymachines had that API as well

Taco Cohen@TacoCohen

A gift from the Gods. Dealing with multiple models and many envs in the same RL codebase while respecting correctness constraints (no train / inference tokenization mismatch) is becoming a huge pain. I have a vibe-coded draft PR that does exactly this, but happy I won’t have to land or maintain it now. Let’s hope the field can really standardize on one abstraction.

9:47 AM · May 13, 2026 · 14.6K Views
10:55 AM · May 13, 2026 · 363 Views

@TacoCohen @hallerite @thinkymachines https://github.com/thinking-machines-lab/tinker-cookbook/tree/main/tinker_cookbook/renderers

http://base.py has the ABCs

11:33 AM · May 13, 2026 · 57 Views

some of our fav bugs on the road to `renderers`

read all about it: https://www.primeintellect.ai/blog/renderers

6:54 AM · May 13, 2026 · 4.8K Views

all chat templates are wrong, some chat templates are useful

we found some CRAZY performance wins by patching official templates, and we're shipping them in a standalone library you can use with any RL stack

w/ examples for @vllm_project @sgl_project @huggingface @tinkerapi

Prime Intellect@PrimeIntellect

Introducing Renderers RL trainers work in tokens. Environments work in messages. Going back and forth corrupts sampled tokens, wasting compute on every agentic turn. With Renderers, we fix this mismatch. This unlocks >3x throughput on popular open models.

11:44 PM · May 12, 2026 · 169K Views
11:50 PM · May 12, 2026 · 40.1K Views

the core of the issue is that both encoding and parsing are many-to-one

vanilla TITO does prefix lookup in token-space, which misses many rendering collisions

the solution is to do lookup in message-space, then input prep in token-space, which we call bridge_to_next_turn

will brown@willccbb

all chat templates are wrong, some chat templates are useful we found some CRAZY performance wins by patching official templates, and we're shipping them in a standalone library you can use with any RL stack w/ examples for @vllm_project @sgl_project @huggingface @tinkerapi

11:50 PM · May 12, 2026 · 40.1K Views
11:57 PM · May 12, 2026 · 2K Views

@vllm_project @sgl_project @huggingface @tinkerapi we're intending for this to become a programmable source of truth for template implementations so that we can finally get rid of jinja

lots here already, but PRs welcome for all models!

will brown@willccbb

the core of the issue is that both encoding and parsing are many-to-one vanilla TITO does prefix lookup in token-space, which misses many rendering collisions the solution is to do lookup in message-space, then input prep in token-space, which we call bridge_to_next_turn

11:57 PM · May 12, 2026 · 2K Views
12:10 AM · May 13, 2026 · 1.7K Views

@vllm_project @sgl_project @huggingface @tinkerapi from a live run:

12:21 AM · May 13, 2026 · 1.4K Views

We are open sourcing renderers

For RL, the inference server should be simple Tokens in, tokens out

renderers is the token-level chat templating layer to >render messages to tokens >parse completions to structure >bridge rollouts byte-for-byte > >3x throughput on openmodels

Prime Intellect@PrimeIntellect

Introducing Renderers RL trainers work in tokens. Environments work in messages. Going back and forth corrupts sampled tokens, wasting compute on every agentic turn. With Renderers, we fix this mismatch. This unlocks >3x throughput on popular open models.

11:44 PM · May 12, 2026 · 169K Views
12:03 AM · May 13, 2026 · 9.1K Views

working at prime is just "ugh i had this gnarly problem, let’s fix it and then make it available to everyone"

a ton of other things are coming, can’t wait to show it to yall :)

Prime Intellect@PrimeIntellect

Introducing Renderers RL trainers work in tokens. Environments work in messages. Going back and forth corrupts sampled tokens, wasting compute on every agentic turn. With Renderers, we fix this mismatch. This unlocks >3x throughput on popular open models.

11:44 PM · May 12, 2026 · 169K Views
6:40 AM · May 13, 2026 · 4.5K Views

never again

Prime Intellect@PrimeIntellect

Introducing Renderers RL trainers work in tokens. Environments work in messages. Going back and forth corrupts sampled tokens, wasting compute on every agentic turn. With Renderers, we fix this mismatch. This unlocks >3x throughput on popular open models.

11:44 PM · May 12, 2026 · 169K Views
11:50 PM · May 12, 2026 · 6K Views
PrimeIntellect introduces Renderers boosting RL throughput over 3x · KRO · Digg