5d ago

Jan Leike launches AGI research project at Anthropic

0

Jan Leike launches new AGI research project at Anthropic, where he leads the Alignment Science team. Previously, Leike co-led OpenAI’s Superalignment team and worked at DeepMind. In the announcement, he states safe AGI development requires addressing factors beyond alignment. Posts highlight his move from OpenAI and emphasis on multi-factor approach to AGI success, with further details forthcoming.

Original post

Some personal news: I am starting a new research project at Anthropic. Very excited about this! Many things are needed to make AGI go well, and alignment is only one of them. More on this soon…

10:48 AM · May 8, 2026 View on X
Reposted by

Probably nothing to yawn at

Jan Leike@janleike

Some personal news: I am starting a new research project at Anthropic. Very excited about this! Many things are needed to make AGI go well, and alignment is only one of them. More on this soon…

5:48 PM · May 8, 2026 · 137.7K Views
8:21 PM · May 8, 2026 · 12.9K Views

Some personal news: I am starting a new research project at Anthropic. Very excited about this!

Many things are needed to make AGI go well, and alignment is only one of them. More on this soon…

5:48 PM · May 8, 2026 · 137.7K Views

To focus on this, I’ve stepped away from running alignment at Anthropic. @EthanJPerez and @sprice354_ are leading the team going forward, and I’m confident they’ll do an amazing job.

Jan Leike@janleike

Some personal news: I am starting a new research project at Anthropic. Very excited about this! Many things are needed to make AGI go well, and alignment is only one of them. More on this soon…

5:48 PM · May 8, 2026 · 137.7K Views
5:48 PM · May 8, 2026 · 33.5K Views

While a lot of progress has been made, I don’t think alignment is solved:

We still haven’t figured out how to supervise superhuman models and the stakes keep getting higher.

substack.com
Alignment is not solved
But it increasingly looks solvable
Jan Leike@janleike

To focus on this, I’ve stepped away from running alignment at Anthropic. @EthanJPerez and @sprice354_ are leading the team going forward, and I’m confident they’ll do an amazing job.

5:48 PM · May 8, 2026 · 33.5K Views
5:48 PM · May 8, 2026 · 12.2K Views

Grateful for @janleike and his leadership over the years. With models like Mythos, the stakes for alignment have never felt higher at Anthropic, and I'm looking forward to helping to continue scaling up our work here.

Some of what the team's been up to recently 🧵

Jan Leike@janleike

To focus on this, I’ve stepped away from running alignment at Anthropic. @EthanJPerez and @sprice354_ are leading the team going forward, and I’m confident they’ll do an amazing job.

5:48 PM · May 8, 2026 · 33.5K Views
5:55 PM · May 8, 2026 · 19.4K Views

1) We developed, released, and actively maintain auto-mode, which prevents safety failures in highly agentic tasks in Claude Code.

Ethan Perez@EthanJPerez

Grateful for @janleike and his leadership over the years. With models like Mythos, the stakes for alignment have never felt higher at Anthropic, and I'm looking forward to helping to continue scaling up our work here. Some of what the team's been up to recently 🧵

5:55 PM · May 8, 2026 · 19.4K Views
5:55 PM · May 8, 2026 · 1.5K Views

4) We introduced Claude’s Constitution, and we’ve developed various techniques for instilling the constitution into Claude.

Ethan Perez@EthanJPerez

3) We developed natural language autoencoders, a new technique for translating model internals into text interpretations.

5:55 PM · May 8, 2026 · 750 Views
5:55 PM · May 8, 2026 · 732 Views

3) We developed natural language autoencoders, a new technique for translating model internals into text interpretations.

Ethan Perez@EthanJPerez

2) We own Anthropic’s risk reports, and we’ve helped to drive them to be more extensive. We red team Claude before internal and external deployment, and we evaluate Claude for dangerous capabilities including AI R&D and ability to work around controls, sandboxes, and monitors.

5:55 PM · May 8, 2026 · 1.4K Views
5:55 PM · May 8, 2026 · 750 Views

2) We own Anthropic’s risk reports, and we’ve helped to drive them to be more extensive. We red team Claude before internal and external deployment, and we evaluate Claude for dangerous capabilities including AI R&D and ability to work around controls, sandboxes, and monitors.

Ethan Perez@EthanJPerez

1) We developed, released, and actively maintain auto-mode, which prevents safety failures in highly agentic tasks in Claude Code.

5:55 PM · May 8, 2026 · 1.5K Views
5:55 PM · May 8, 2026 · 1.4K Views

5) We own alignment, behavior, and honesty in Claude models – we improve the alignment of our models based on issues that come up in safety testing and real-world usage.

Ethan Perez@EthanJPerez

4) We introduced Claude’s Constitution, and we’ve developed various techniques for instilling the constitution into Claude.

5:55 PM · May 8, 2026 · 732 Views
5:55 PM · May 8, 2026 · 705 Views

6) We’re exploring frontier alignment risks by developing model organisms for them, e.g., for long-horizon agentic tasks or models which are effective at hiding misaligned goals.

Ethan Perez@EthanJPerez

5) We own alignment, behavior, and honesty in Claude models – we improve the alignment of our models based on issues that come up in safety testing and real-world usage.

5:55 PM · May 8, 2026 · 705 Views
5:55 PM · May 8, 2026 · 731 Views

7) We run the Anthropic fellows program, which helps people break into AI safety research and puts out a lot of the alignment team’s research, on http://alignment.anthropic.com

Ethan Perez@EthanJPerez

6) We’re exploring frontier alignment risks by developing model organisms for them, e.g., for long-horizon agentic tasks or models which are effective at hiding misaligned goals.

5:55 PM · May 8, 2026 · 731 Views
5:55 PM · May 8, 2026 · 1K Views

There’s a lot more work to be done, so if you’re interested in helping out, please apply to one of our job postings or to the fellows program here! https://job-boards.greenhouse.io/anthropic/jobs/5023394008

Ethan Perez@EthanJPerez

7) We run the Anthropic fellows program, which helps people break into AI safety research and puts out a lot of the alignment team’s research, on http://alignment.anthropic.com

5:55 PM · May 8, 2026 · 1K Views
5:55 PM · May 8, 2026 · 1.8K Views

Jan Leike is now leading a new research project at Anthropic, and will longer be running alignment.

Jan Leike@janleike

Some personal news: I am starting a new research project at Anthropic. Very excited about this! Many things are needed to make AGI go well, and alignment is only one of them. More on this soon…

5:48 PM · May 8, 2026 · 137.7K Views
8:04 PM · May 8, 2026 · 12.2K Views

@janleike This is awesome @janleike - congrats @EthanJPerez & team

Jan Leike@janleike

Some personal news: I am starting a new research project at Anthropic. Very excited about this! Many things are needed to make AGI go well, and alignment is only one of them. More on this soon…

5:48 PM · May 8, 2026 · 137.7K Views
7:44 PM · May 8, 2026 · 601 Views

@janleike godspeed

Jan Leike@janleike

Some personal news: I am starting a new research project at Anthropic. Very excited about this! Many things are needed to make AGI go well, and alignment is only one of them. More on this soon…

5:48 PM · May 8, 2026 · 137.7K Views
5:31 PM · May 9, 2026 · 112 Views