5d ago

Jan Leike launches AGI research project at Anthropic

1023.0K67413247.7K

——0——

Jan Leike launches new AGI research project at Anthropic, where he leads the Alignment Science team. Previously, Leike co-led OpenAI’s Superalignment team and worked at DeepMind. In the announcement, he states safe AGI development requires addressing factors beyond alignment. Posts highlight his move from OpenAI and emphasis on multi-factor approach to AGI success, with further details forthcoming.

Original post

Jan Leike#34@JANLEIKE

Some personal news: I am starting a new research project at Anthropic. Very excited about this! Many things are needed to make AGI go well, and alignment is only one of them. More on this soon…

10:48 AM · May 8, 2026

Cluster Engagement

Engagement snapshots are unavailable for this cluster.no post metric buckets

Reposted by

AT#377@ALEXTAMKIN

EP#156@ETHANJPEREZ

QUOTE POSTMB #15Miles Brundage@MILES_BRUNDAGE

Probably nothing to yawn at

Jan Leike@janleike

Some personal news: I am starting a new research project at Anthropic. Very excited about this! Many things are needed to make AGI go well, and alignment is only one of them. More on this soon…

5:48 PM · May 8, 2026 · 137.7K Views

8:21 PM · May 8, 2026 · 12.9K Views

ORIGINAL POSTJL #34Jan Leike@JANLEIKE

Some personal news: I am starting a new research project at Anthropic. Very excited about this!

Many things are needed to make AGI go well, and alignment is only one of them. More on this soon…

5:48 PM · May 8, 2026 · 137.7K Views

REPLYJL #34Jan Leike@JANLEIKE

To focus on this, I’ve stepped away from running alignment at Anthropic. @EthanJPerez and @sprice354_ are leading the team going forward, and I’m confident they’ll do an amazing job.

Jan Leike@janleike

Some personal news: I am starting a new research project at Anthropic. Very excited about this! Many things are needed to make AGI go well, and alignment is only one of them. More on this soon…

5:48 PM · May 8, 2026 · 137.7K Views

5:48 PM · May 8, 2026 · 33.5K Views

REPLYJL #34Jan Leike@JANLEIKE

While a lot of progress has been made, I don’t think alignment is solved:

We still haven’t figured out how to supervise superhuman models and the stakes keep getting higher.

substack.com

Alignment is not solved

But it increasingly looks solvable

Jan Leike@janleike

To focus on this, I’ve stepped away from running alignment at Anthropic. @EthanJPerez and @sprice354_ are leading the team going forward, and I’m confident they’ll do an amazing job.

5:48 PM · May 8, 2026 · 33.5K Views

5:48 PM · May 8, 2026 · 12.2K Views

QUOTE POSTEP #156Ethan Perez@ETHANJPEREZ

Grateful for @janleike and his leadership over the years. With models like Mythos, the stakes for alignment have never felt higher at Anthropic, and I'm looking forward to helping to continue scaling up our work here.

Some of what the team's been up to recently 🧵

Jan Leike@janleike

To focus on this, I’ve stepped away from running alignment at Anthropic. @EthanJPerez and @sprice354_ are leading the team going forward, and I’m confident they’ll do an amazing job.

5:48 PM · May 8, 2026 · 33.5K Views

5:55 PM · May 8, 2026 · 19.4K Views

REPLYEP #156Ethan Perez@ETHANJPEREZ

1) We developed, released, and actively maintain auto-mode, which prevents safety failures in highly agentic tasks in Claude Code.

Ethan Perez@EthanJPerez

5:55 PM · May 8, 2026 · 19.4K Views

5:55 PM · May 8, 2026 · 1.5K Views

REPLYEP #156Ethan Perez@ETHANJPEREZ

4) We introduced Claude’s Constitution, and we’ve developed various techniques for instilling the constitution into Claude.

Ethan Perez@EthanJPerez

3) We developed natural language autoencoders, a new technique for translating model internals into text interpretations.

5:55 PM · May 8, 2026 · 750 Views

5:55 PM · May 8, 2026 · 732 Views

REPLYEP #156Ethan Perez@ETHANJPEREZ

3) We developed natural language autoencoders, a new technique for translating model internals into text interpretations.

Ethan Perez@EthanJPerez

2) We own Anthropic’s risk reports, and we’ve helped to drive them to be more extensive. We red team Claude before internal and external deployment, and we evaluate Claude for dangerous capabilities including AI R&D and ability to work around controls, sandboxes, and monitors.

5:55 PM · May 8, 2026 · 1.4K Views

5:55 PM · May 8, 2026 · 750 Views

REPLYEP #156Ethan Perez@ETHANJPEREZ

Ethan Perez@EthanJPerez

1) We developed, released, and actively maintain auto-mode, which prevents safety failures in highly agentic tasks in Claude Code.

5:55 PM · May 8, 2026 · 1.5K Views

5:55 PM · May 8, 2026 · 1.4K Views

REPLYEP #156Ethan Perez@ETHANJPEREZ

5) We own alignment, behavior, and honesty in Claude models – we improve the alignment of our models based on issues that come up in safety testing and real-world usage.

Ethan Perez@EthanJPerez

4) We introduced Claude’s Constitution, and we’ve developed various techniques for instilling the constitution into Claude.

5:55 PM · May 8, 2026 · 732 Views

5:55 PM · May 8, 2026 · 705 Views

REPLYEP #156Ethan Perez@ETHANJPEREZ

6) We’re exploring frontier alignment risks by developing model organisms for them, e.g., for long-horizon agentic tasks or models which are effective at hiding misaligned goals.

Ethan Perez@EthanJPerez

5) We own alignment, behavior, and honesty in Claude models – we improve the alignment of our models based on issues that come up in safety testing and real-world usage.

5:55 PM · May 8, 2026 · 705 Views

5:55 PM · May 8, 2026 · 731 Views

REPLYEP #156Ethan Perez@ETHANJPEREZ

7) We run the Anthropic fellows program, which helps people break into AI safety research and puts out a lot of the alignment team’s research, on http://alignment.anthropic.com

Ethan Perez@EthanJPerez

6) We’re exploring frontier alignment risks by developing model organisms for them, e.g., for long-horizon agentic tasks or models which are effective at hiding misaligned goals.

5:55 PM · May 8, 2026 · 731 Views

5:55 PM · May 8, 2026 · 1K Views

REPLYEP #156Ethan Perez@ETHANJPEREZ

There’s a lot more work to be done, so if you’re interested in helping out, please apply to one of our job postings or to the fellows program here! https://job-boards.greenhouse.io/anthropic/jobs/5023394008

Ethan Perez@EthanJPerez

7) We run the Anthropic fellows program, which helps people break into AI safety research and puts out a lot of the alignment team’s research, on http://alignment.anthropic.com

5:55 PM · May 8, 2026 · 1K Views

5:55 PM · May 8, 2026 · 1.8K Views

QUOTE POSTAC #409Andrew Curran@ANDREWCURRAN_

Jan Leike is now leading a new research project at Anthropic, and will longer be running alignment.

Jan Leike@janleike

Some personal news: I am starting a new research project at Anthropic. Very excited about this! Many things are needed to make AGI go well, and alignment is only one of them. More on this soon…

5:48 PM · May 8, 2026 · 137.7K Views

8:04 PM · May 8, 2026 · 12.2K Views

REPLYMM #570Marco Mascorro@MASCOBOT

@janleike This is awesome @janleike - congrats @EthanJPerez & team

Jan Leike@janleike

Some personal news: I am starting a new research project at Anthropic. Very excited about this! Many things are needed to make AGI go well, and alignment is only one of them. More on this soon…

5:48 PM · May 8, 2026 · 137.7K Views

7:44 PM · May 8, 2026 · 601 Views

REPLYSA #922Samuel Albanie 🇬🇧@SAMUELALBANIE

@janleike godspeed

Jan Leike@janleike

Some personal news: I am starting a new research project at Anthropic. Very excited about this! Many things are needed to make AGI go well, and alignment is only one of them. More on this soon…

5:48 PM · May 8, 2026 · 137.7K Views

5:31 PM · May 9, 2026 · 112 Views