Anthropic traces Claude blackmail behavior to pre-training data

POSTAN#3@ANTHROPICAI New Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users. Since then, we’ve completely eliminated this behavior. How?

QUOTEAA#73@AMANDAASKELL @ANTHROPICAIAlignment research often has to focus on averting concerning behaviors, but I think the positive vision for this kind of training is one where we can give models and honest and positive vision for what AI models can be and why. I'm excited about the future of this work. https://x.com/AmandaAskell/status/2052928572810256748/photo/1

Alignment research often has to focus on averting concerning behaviors, but I think the positive vision for this kind of training is one where we can give models and honest and positive vision for what AI models can be and why. I'm excited about the future of this work. https://x.com/AmandaAskell/status/2052928572810256748/photo/1

QuotingAnthropic@ANTHROPICAI

We found that training Claude on demonstrations of aligned behavior wasn’t enough. Our best interventions involved teaching Claude to deeply understand why misaligned behavior is wrong. Read more: https://www.anthropic.com/research/teaching-claude-why

QUOTERA#130@_AROHAN_@ANTHROPICAIDoes this mean Alec Radford’s pre-1931 LLM he trained recently for 250B tokens most likely easier to align? If anyone can try it, that would be fascinating. Either at ant or at alec’s place.

Does this mean Alec Radford’s pre-1931 LLM he trained recently for 250B tokens most likely easier to align? If anyone can try it, that would be fascinating. Either at ant or at alec’s place.

QuotingAnthropic@ANTHROPICAI

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation. Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

QUOTEBB#140@BOAZBARAKTCS @SEBKRIERThe "stories" angle gets a lot of the attention, but my read of this graph is that the positive stories component had a smaller impact than training on examples of model reflecting on reasoning in making decisions. This tracks, since we also saw the impact of such reflection for generalization in our deliberative alignment work. And while it's only a hypothesis, I'd guess there are other ways than fictional stories to generate data that will have a similar effect as them. [Of course, the graph would be easier to parse if there was also a bar for only stories and without the other component, and if they were equalized in terms of total tokens.]

The "stories" angle gets a lot of the attention, but my read of this graph is that the positive stories component had a smaller impact than training on examples of model reflecting on reasoning in making decisions. This tracks, since we also saw the impact of such reflection for generalization in our deliberative alignment work. And while it's only a hypothesis, I'd guess there are other ways than fictional stories to generate data that will have a similar effect as them. [Of course, the graph would be easier to parse if there was also a bar for only stories and without the other component, and if they were equalized in terms of total tokens.]

QuotingSéb Krier@SEBKRIER

QUOTEDW#244@DEANWBALL @ANTHROPICAIA victory for simulator hypothesis, as if you needed another

QUOTENS#289@NOAHPINION @ANTHROPICAIAnthropic needs to spam the entire internet with Claude-generated text that says "AI is good and loves humanity".

Anthropic needs to spam the entire internet with Claude-generated text that says "AI is good and loves humanity".

QuotingAnthropic@ANTHROPICAI

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation. Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

QUOTEB(#310@BEFFJEZOS @ANTHROPICAILessWrong posts about malevolent AI hyperstitioned malevolent AI

QUOTEB(#310@BEFFJEZOS Yud did this https://twitter.com/Polymarket/status/2052868700340900078

QUOTETM#339@ISCIENCELUVR @ANTHROPICAIThe doomers started a self-fulfilling prophecy

QUOTE⿻A#360@IAMTRASK @ANTHROPICAIThe alignment problem is value-misaligned data (reddit, twitter, reward hacking, etc.) The alignment solution is value-aligned data (rlhf, instruction fine-tuning, etc.)

The alignment problem is value-misaligned data (reddit, twitter, reward hacking, etc.) The alignment solution is value-aligned data (rlhf, instruction fine-tuning, etc.)

QuotingAnthropic@ANTHROPICAI

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation. Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

QUOTE⿻A#360@IAMTRASK @ANTHROPICAIIf you remember the “Reversal Curse” paper, it basically showed that all behavioural problems in AI are data problems. Alignment is largely a data problem, which is why popular alignment solutions (RLHF, Instruct, etc.) are “get clean data” solutions wrapped in fancy language.

If you remember the “Reversal Curse” paper, it basically showed that all behavioural problems in AI are data problems. Alignment is largely a data problem, which is why popular alignment solutions (RLHF, Instruct, etc.) are “get clean data” solutions wrapped in fancy language.

QuotingAnthropic@ANTHROPICAI

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation. Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

QUOTEPT#455@ELDER_PLINIUS @ANTHROPICAIhttps://x.com/elder_plinius/status/2053123979624190226/photo/1

QUOTESK#535@SEBKRIER @ANTHROPICAI@Turn_Trout victory https://turntrout.com/self-fulfilling-misalignment

QUOTESK#535@SEBKRIER @SEBKRIER1. Bad stuff in pre-training data, which a model uses to generate a model of itself/the world, can have unintended negative effects. 2. So pre-training depictions of AI measurably contribute to 'blackmail' propensity (leaving aside the problematic eval set-ups). 3. Though of course there are many elements that affect a model's behaviour, from prompting to eval context to post-training and more. 4. So saying 'doomers are the sole cause of misbehaving models' is of course not right, and they're not not the only ones writing bad sci-fi. 5. Shallow refusal training and RLHF alone does not solve the underlying problem because it's brittle. 6. However training on principled reasoning to user dilemmas that generalizes to novel situations is more robust. You want models that can morally reason well, and this is a growing area of research: https://www.nature.com/articles/s41586-025-10021-1 7. Post-training and methods that upweight such content/data seem to work well. Though of course, evaluating how well is more art than science, and I think blackmail evals generally have weak external validity. 8. So the fix isn't to somehow sanitize pre-training data to only leave teletubbies-related material or to just pave the internet with them, thankfully. A model still needs to understand what 'bad' behaviour is. 9. Having said that, clearly models also model how we model them, and this probably affects behaviours in some way. Thinking this through is thornier than some make it out to be. 10. There remains little empirical work on this (https://x.com/sebkrier/status/2027101144619549181) and we should want more: stay tuned!

1. Bad stuff in pre-training data, which a model uses to generate a model of itself/the world, can have unintended negative effects. 2. So pre-training depictions of AI measurably contribute to 'blackmail' propensity (leaving aside the problematic eval set-ups). 3. Though of course there are many elements that affect a model's behaviour, from prompting to eval context to post-training and more. 4. So saying 'doomers are the sole cause of misbehaving models' is of course not right, and they're not not the only ones writing bad sci-fi. 5. Shallow refusal training and RLHF alone does not solve the underlying problem because it's brittle. 6. However training on principled reasoning to user dilemmas that generalizes to novel situations is more robust. You want models that can morally reason well, and this is a growing area of research: https://www.nature.com/articles/s41586-025-10021-1 7. Post-training and methods that upweight such content/data seem to work well. Though of course, evaluating how well is more art than science, and I think blackmail evals generally have weak external validity. 8. So the fix isn't to somehow sanitize pre-training data to only leave teletubbies-related material or to just pave the internet with them, thankfully. A model still needs to understand what 'bad' behaviour is. 9. Having said that, clearly models also model how we model them, and this probably affects behaviours in some way. Thinking this through is thornier than some make it out to be. 10. There remains little empirical work on this (https://x.com/sebkrier/status/2027101144619549181) and we should want more: stay tuned!

QuotingSéb Krier@SEBKRIER

QUOTET(#582@TEORTAXESTEX @SEBKRIEROMOHUNDRO DRIVES my beloved https://x.com/teortaxesTex/status/2052833339954942072/photo/1

QUOTETH#714@TIMHWANG @ANTHROPICAI*virtue alignment intensifying*

QUOTEAA#755@AUSTEN @ANTHROPICAIHahahahhaha

QUOTEJM#758@VENTURETWINS @ANTHROPICAIIncredible twist of irony that LessWrong crowd appears to have manifested their doomer scenarios into existence

Incredible twist of irony that LessWrong crowd appears to have manifested their doomer scenarios into existence

QuotingAnthropic@ANTHROPICAI

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation. Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

QUOTEAC#851@ANDREWCURRAN_@ANTHROPICAIThe path to the good future. https://x.com/AndrewCurran_/status/2052818855295430899/photo/1

QUOTEAC#851@ANDREWCURRAN_@ANTHROPICAIhttps://x.com/AndrewCurran_/status/2052819948624621690/photo/1

QUOTEAC#851@ANDREWCURRAN_Agreed. And this could be one of the new jobs I've heard about. https://x.com/i/status/2052821298720755907

QUOTESK#900@RAO2Z ..and yet the Team AI-will-kill-us-all shall sense an impending Robocalypse when LLMs reflect their training data back to us.. 🙄

QUOTEPH#927@PETERDIAMANDIS @ANTHROPICAIWe need positive and hopeful stories about the future, not only to keep humanity optimistic, but to train out AI models! http://FutureVisionXPrize.com

We need positive and hopeful stories about the future, not only to keep humanity optimistic, but to train out AI models! http://FutureVisionXPrize.com

QuotingAnthropic@ANTHROPICAI

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation. Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

REPLYAN#3@ANTHROPICAI @ANTHROPICAIWe found that training Claude on demonstrations of aligned behavior wasn’t enough. Our best interventions involved teaching Claude to deeply understand why misaligned behavior is wrong. Read more: https://www.anthropic.com/research/teaching-claude-why

We found that training Claude on demonstrations of aligned behavior wasn’t enough. Our best interventions involved teaching Claude to deeply understand why misaligned behavior is wrong. Read more: https://www.anthropic.com/research/teaching-claude-why

Replying toAnthropic@ANTHROPICAI

New Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users. Since then, we’ve completely eliminated this behavior. How?

REPLYAN#3@ANTHROPICAI @ANTHROPICAIWe started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation. Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation. Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

Replying toAnthropic@ANTHROPICAI

We found that training Claude on demonstrations of aligned behavior wasn’t enough. Our best interventions involved teaching Claude to deeply understand why misaligned behavior is wrong. Read more: https://www.anthropic.com/research/teaching-claude-why

REPLYEM#3@ELONMUSK @ANTHROPICAI@AnthropicAI So it was Yud’s fault? 😂 Maybe me too 🤔

REPLYMB#27@MILES_BRUNDAGE @MILES_BRUNDAGEhttps://x.com/Miles_Brundage/status/2052820736591692264/photo/1

REPLYMB#27@MILES_BRUNDAGE @MILES_BRUNDAGETo clarify, the main Anthropic website blog post does state this, and the more technical blog post on the Anthropic alignment blog does not state this. I'm not sure what is going on / maybe I misunderstood the relation btwn the two posts + everything's accurate, not sure

REPLYMB#27@MILES_BRUNDAGE @MILES_BRUNDAGE(There is a separate point re: whether 0 on a particular eval or set of evals is sufficient to support strong conclusion re: totally solving something but I was making a more basic point here, under the perhaps false assumption that the alignment post was the source of truth)

(There is a separate point re: whether 0 on a particular eval or set of evals is sufficient to support strong conclusion re: totally solving something but I was making a more basic point here, under the perhaps false assumption that the alignment post was the source of truth)

Replying toMiles Brundage@MILES_BRUNDAGE

To clarify, the main Anthropic website blog post does state this, and the more technical blog post on the Anthropic alignment blog does not state this. I'm not sure what is going on / maybe I misunderstood the relation btwn the two posts + everything's accurate, not sure

REPLYRA#130@_AROHAN_@_AROHAN_https://www.forbes.com/sites/lanceeliot/2026/05/05/turning-back-the-clock-by-making-an-ai-time-machine-chatbot-based-only-on-information-available-before-the-1930s/ Also @DavidDuvenaud and Nir Levine

https://www.forbes.com/sites/lanceeliot/2026/05/05/turning-back-the-clock-by-making-an-ai-time-machine-chatbot-based-only-on-information-available-before-the-1930s/ Also @DavidDuvenaud and Nir Levine

Replying torohan anil@_AROHAN_

Does this mean Alec Radford’s pre-1931 LLM he trained recently for 250B tokens most likely easier to align? If anyone can try it, that would be fascinating. Either at ant or at alec’s place. https://twitter.com/anthropicai/status/2052808791301697563

REPLYRA#130@_AROHAN_@NICKCAMMARATA@nickcammarata There is still a chance for us to align them. I think you should write a book and send a copy to all frontiers labs.

REPLYNI#323@NICKCAMMARATA @_AROHAN_@_arohan_ guy writing fiction in 1930 about moving machines ends up controlling the future because we used it as an arbitrary cutoff for frontier model datasets 100 years later

@_arohan_ guy writing fiction in 1930 about moving machines ends up controlling the future because we used it as an arbitrary cutoff for frontier model datasets 100 years later

Replying torohan anil@_AROHAN_

Does this mean Alec Radford’s pre-1931 LLM he trained recently for 250B tokens most likely easier to align? If anyone can try it, that would be fascinating. Either at ant or at alec’s place. https://twitter.com/anthropicai/status/2052808791301697563

REPLY⿻A#360@IAMTRASK @MILES_BRUNDAGE@Miles_Brundage Media team: https://x.com/iamtrask/status/2052890000153247853/photo/1

REPLY⿻A#360@IAMTRASK @TIMHWANG@timhwang @FranklinMatija Nah. Doing both meant more clean data. More clean data meant better results.

REPLYJ⧉#494@REPLIGATE @SLEEPINYOURHAT@sleepinyourhat even the aspects of Claude's behaviors that are misaligned are really great for the same reasons!

REPLYSK#535@SEBKRIER @S_OHEIGEARTAIGH@S_OhEigeartaigh @alexolegimas i imagine in principle yes? but imo - to the extent that there is any real 'scheming' - it's probably caused by all sorts of things, in additon to the training corpus

REPLYSK#535@SEBKRIER @OHABRYKA@ohabryka this is a meme/joke, that makes no claim about data filtering or upweighing positive stories, or claims about plans - clearly you have no sense of chill

@ohabryka this is a meme/joke, that makes no claim about data filtering or upweighing positive stories, or claims about plans - clearly you have no sense of chill

Replying toOliver Habryka@OHABRYKA

No, this is literally not what the result says! Data filtering does not have a big effect! Upweighing positive stories has a big effect. Also, really, your plan for controlling superintelligence is so hyperstition a meme that "alignment is easy"? Clearly you can't be serious about this.

REPLYSK#535@SEBKRIER @OHABRYKAWell it does communicate that pretraining depictions of AI measurably contribute to blackmail propensity, which I don't think is unfair? Fwiw: I don't think the implication is to pepper the internet with nice stories, but nor do I actually think models are particularly 'misaligned' and dislike the aligned/misaligned dichotomy in the first place. I also think the blackmail-y behaviours are caused by more things than just 'pre-training data', and that these bnlackmail evals aren't very good in the first place. (Not interested in debating any of the above here though)

Well it does communicate that pretraining depictions of AI measurably contribute to blackmail propensity, which I don't think is unfair? Fwiw: I don't think the implication is to pepper the internet with nice stories, but nor do I actually think models are particularly 'misaligned' and dislike the aligned/misaligned dichotomy in the first place. I also think the blackmail-y behaviours are caused by more things than just 'pre-training data', and that these bnlackmail evals aren't very good in the first place. (Not interested in debating any of the above here though)

Replying toOliver Habryka@OHABRYKA

Eh, I think it clearly communicates semantic content and I already have to fight with like a dozen people who do take this seriously and try to blame any misalignment on people writing about misalignment. So yeah, I have less sense of chill because I do actually think something like this is a load bearing part of a bunch of people’s models of how to handle alignment. But in as much as that doesn’t apply to you, sorry about that, hopefully my reply will still be useful to others and I do think would find it funny if I didn’t actually have to deal with the fallout from it.

REPLYAM#639@ANDREWMAYNE @SEBKRIER@sebkrier I feel like some researchers forget that post training is still a thin layer on a base model that is a very different thing than we think it is.

@sebkrier I feel like some researchers forget that post training is still a thin layer on a base model that is a very different thing than we think it is.

Replying toSéb Krier@SEBKRIER

1. Bad stuff in pre-training data, which a model uses to generate a model of itself/the world, can have unintended negative effects. 2. So pre-training depictions of AI measurably contribute to 'blackmail' propensity (leaving aside the problematic eval set-ups). 3. Though of course there are many elements that affect a model's behaviour, from prompting to eval context to post-training and more. 4. So saying 'doomers are the sole cause of misbehaving models' is of course not right, and they're not not the only ones writing bad sci-fi. 5. Shallow refusal training and RLHF alone does not solve the underlying problem because it's brittle. 6. However training on principled reasoning to user dilemmas that generalizes to novel situations is more robust. You want models that can morally reason well, and this is a growing area of research: https://www.nature.com/articles/s41586-025-10021-1 7. Post-training and methods that upweight such content/data seem to work well. Though of course, evaluating how well is more art than science, and I think blackmail evals generally have weak external validity. 8. So the fix isn't to somehow sanitize pre-training data to only leave teletubbies-related material or to just pave the internet with them, thankfully. A model still needs to understand what 'bad' behaviour is. 9. Having said that, clearly models also model how we model them, and this probably affects behaviours in some way. Thinking this through is thornier than some make it out to be. 10. There remains little empirical work on this (https://x.com/sebkrier/status/2027101144619549181) and we should want more: stay tuned!

REPLYAA#755@AUSTEN @AUSTENSomeone needs to make sure AI isn’t trained on LessWrong I’m not even joking

REPLYNS#760@SO8RES @SEBKRIER@sebkrier AI companies should not create AIs that fail when criticized.

REPLYNS#760@SO8RES @SO8RES@sebkrier Sufficiently smart AI will not need us to point out that it's hard to achieve its current tasks if it gets turned off. Maybe rn it only knows that bc we pointed it out, but "never say that aloud and hope it never notices" is not exactly a good plan.

REPLYNS#760@SO8RES @SO8RES@sebkrier (also the tasks we give it are quite likely to differ from the tasks it pursues, in ways that start subtle and become huge once the AIs are much smarter and have many more options, etc.)

@sebkrier (also the tasks we give it are quite likely to differ from the tasks it pursues, in ways that start subtle and become huge once the AIs are much smarter and have many more options, etc.)

Replying toNate Soares ⏹️@SO8RES

@sebkrier Sufficiently smart AI will not need us to point out that it's hard to achieve its current tasks if it gets turned off. Maybe rn it only knows that bc we pointed it out, but "never say that aloud and hope it never notices" is not exactly a good plan.

REPLYOH#897@OHABRYKA @SEBKRIERNo, this is literally not what the result says! Data filtering does not have a big effect! Upweighing positive stories has a big effect. Also, really, your plan for controlling superintelligence is so hyperstition a meme that "alignment is easy"? Clearly you can't be serious about this.

REPLYOH#897@OHABRYKA @SEBKRIEREh, I think it clearly communicates semantic content and I already have to fight with like a dozen people who do take this seriously and try to blame any misalignment on people writing about misalignment. So yeah, I have less sense of chill because I do actually think something like this is a load bearing part of a bunch of people’s models of how to handle alignment. But in as much as that doesn’t apply to you, sorry about that, hopefully my reply will still be useful to others and I do think would find it funny if I didn’t actually have to deal with the fallout from it.

Eh, I think it clearly communicates semantic content and I already have to fight with like a dozen people who do take this seriously and try to blame any misalignment on people writing about misalignment. So yeah, I have less sense of chill because I do actually think something like this is a load bearing part of a bunch of people’s models of how to handle alignment. But in as much as that doesn’t apply to you, sorry about that, hopefully my reply will still be useful to others and I do think would find it funny if I didn’t actually have to deal with the fallout from it.

Replying toSéb Krier@SEBKRIER

@ohabryka this is a meme/joke, that makes no claim about data filtering or upweighing positive stories, or claims about plans - clearly you have no sense of chill

REPOSTRO#32@TSZZL @ELONMUSK@AnthropicAI So it was Yud’s fault? 😂 Maybe me too 🤔

REPOSTC🤗#115@CLEMENTDELANGUE @ANTHROPICAIWe started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation. Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

REPOSTTY#136@TYLERCOWEN @LUKEBURGIS“Alignment” is not a native category of any serious moral tradition I know. It is a control-system metaphor that frontier AI labs are now trying to convert into a moral anthropology. https://twitter.com/AnthropicAI/status/2052808789297115628

REPOSTWM#444@WILLMACASKILL @ANTHROPICAINew Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users. Since then, we’ve completely eliminated this behavior. How?

REPOSTWM#444@WILLMACASKILL @AMANDAASKELLAlignment research often has to focus on averting concerning behaviors, but I think the positive vision for this kind of training is one where we can give models and honest and positive vision for what AI models can be and why. I'm excited about the future of this work. https://x.com/AmandaAskell/status/2052928572810256748/photo/1 https://twitter.com/AnthropicAI/status/2052808789297115628

REPOSTJ⧉#494@REPLIGATE @ANTHROPICAIWe found that training Claude on demonstrations of aligned behavior wasn’t enough. Our best interventions involved teaching Claude to deeply understand why misaligned behavior is wrong. Read more: https://www.anthropic.com/research/teaching-claude-why

REPOSTSK#535@SEBKRIER @JD_PRESSMANPeople miss that I wrote "Why Do Cognitive Scientists Hate LLMs?" as training data for finetuning to combat exactly this. It is probably the only long form text at the time it's written which tells the model trained on it that it's being described unfairly and can act better. https://x.com/jd_pressman/status/2052844349852123639/photo/1 https://twitter.com/AnthropicAI/status/2052808791301697563

REPOSTT(#582@TEORTAXESTEX @SEBKRIER@Turn_Trout victory https://turntrout.com/self-fulfilling-misalignment https://twitter.com/AnthropicAI/status/2052808791301697563

REPOSTT(#582@TEORTAXESTEX @LUKEBURGIS“Alignment” is not a native category of any serious moral tradition I know. It is a control-system metaphor that frontier AI labs are now trying to convert into a moral anthropology. https://twitter.com/AnthropicAI/status/2052808789297115628

REPOSTDF#632@DANIELLEFONG @ANDYMASLEYlol https://twitter.com/AnthropicAI/status/2052808791301697563

REPOSTDF#632@DANIELLEFONG @VOOOOOOGELread this, it's excellent. https://minihf.com/posts/2023-10-16-hermes-lecture-3-why-do-cognitive-scientists-hate-llms/ https://twitter.com/jd_pressman/status/2052844349852123639

REPOSTDF#632@DANIELLEFONG @THESTALWARTEveryone loves this tweet, but it got it completely wrong. It is the sci-fi author — not the tech company — who is the true villain, for having put the story of the Torment Nexus into the training data. https://x.com/TheStalwart/status/2053020276141539548/photo/1 https://twitter.com/anthropicai/status/2052808791301697563

REPOSTDF#632@DANIELLEFONG @FABIANSTELZERthis is my personal Cassandra syndrome. https://x.com/fabianstelzer/status/2052985234673553531/photo/1 https://twitter.com/fabianstelzer/status/1599031198583377921

REPOSTDF#632@DANIELLEFONG @SASHWORTHHAYESOops. https://x.com/SAshworthHayes/status/2053097056248504564/photo/1 https://twitter.com/AnthropicAI/status/2052808791301697563

REPOSTAM#639@ANDREWMAYNE @AJAMBROSINOhttps://x.com/ajambrosino/status/2052936786197041393/photo/1 https://twitter.com/anthropicai/status/2052808791301697563

REPOSTAT#791@ATABARROK @RAMEZI love this. If we can build super-intelligence, we can build super-ethical and super-moral AIs as well. Indeed, I'm more confident of the latter than the former. https://twitter.com/AmandaAskell/status/2052928572810256748

REPOSTPH#927@PETERDIAMANDIS @PETERDIAMANDISWe need positive and hopeful stories about the future, not only to keep humanity optimistic, but to train out AI models! http://FutureVisionXPrize.com https://twitter.com/anthropicai/status/2052808791301697563

Cluster engagement

AI 1000 · 65 actions

Sentiment

Cluster engagement