14d ago

UK AI Security Institute reports OpenAI GPT-5.5 completes 32-step cyber-attack simulation

0

OpenAI’s GPT-5.5 fully completed a multi-step cyber-attack simulation in testing by the UK AI Security Institute, marking the second model to achieve this on the AISI cyber-range evaluation with a 71.4% average pass rate. It edged out Anthropic’s Claude Mythos Preview at 68.6%. The benchmark tests long-horizon agentic capabilities for autonomous cyber operations, including vulnerability discovery and simulated attacks. One task equated to 12 hours of human expert effort.

Original post

Sidenote but again, 🇬🇧 AISI being used to underpin commentary and analysis around the globe. Love to see it.

11:31 AM · Apr 30, 2026 View on X
Reposted by

This is just one eval, but it's an important one - UK AISI’s cyber range tests long-horizon, agentic capability. 5.5 performs similarly to Mythos.

The risks for frontier models are real. But we do our best to deploy AI people can actually use - through hard work on mitigations.

AI Security InstituteAI Security Institute@AISecurityInst

OpenAI’s GPT-5.5 is the second model to complete one of our multi-step cyber-attack simulations end-to-end 🧵

3:07 PM · Apr 30, 2026 · 1.4M Views
12:38 AM · May 1, 2026 · 16.9K Views

Co-sign.

David SacksDavid Sacks@DavidSacks

It’s time to demystify Mythos. Mythos is not magic. It’s not a doomsday device. It’s the first of many models that can automate cyber tasks (just like coding). OpenAI’s GPT-5.5-cyber can now do the same. And all the frontier models (including those from China) will be there within approximately 6 months. It’s important to recognize that these models do not create vulnerabilities; they discover them. The bugs are already in the code. Using AI to discover and patch them will actually harden these systems. The leap from pre-AI cyber to post-AI cyber means that there will be a big upgrade cycle. After that, however, the market is likely to reach a new equilibrium between AI-powered cyber-offense and AI-powered cyber-defense. Obviously it’s important that cyber defenders get access before cyber attackers. That process is already underway but needs to happen quickly (see point above about Chinese models). Unlike Mythos, GPT-5.5-cyber appears not to be token constrained so it may be the first cyber model that defenders actually get to use.

5:45 PM · Apr 30, 2026 · 991K Views
6:34 PM · Apr 30, 2026 · 258.4K Views

@TheRealAdamG @TheRealAdamG I love you but let's not fall into this trap - we should brag about how amazing GPT 5.5 is in codex for normal users, and not about its hacking capabilities.

Adam.GPTAdam.GPT@TheRealAdamG

https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5-5-cyber-capabilities "In April, our evaluation of Anthropic's Claude Mythos Preview found that it represented a step up in cyber performance over previous frontier models and was the first to complete our corporate network attack simulation end-to-end, a multi-step exercise we estimate would take a human around 20 hours. A key question was whether this reflected a breakthrough specific to one model, or part of a broader trend. Results from an early checkpoint of GPT-5.5 suggest the latter: a second model, from a different developer, now reaches a similar level of performance on our cyber evaluations."

5:29 PM · Apr 30, 2026 · 6.4K Views
10:06 PM · Apr 30, 2026 · 1.2K Views

TBH I find it very weird to "compete" on dual use or risky capabilities. We need to measure these to choose appropriate safeguards, but shouldn't optimize or market them.

GPT 5.5 is a great model not because it can find vulnerabilities but because it can deliver value to users.

Lisan al GaibLisan al Gaib@scaling01

GPT-5.5 is on par with Claude Mythos - GPT-5.5 average pass rate of 71.4% (±8.0%) - Mythos Preview 68.6% (±8.7%) - GPT-5.5 solved a task that takes a human expert ~12 hours in under 11 minutes at a cost of $1.73

3:17 PM · Apr 30, 2026 · 413.6K Views
10:00 PM · Apr 30, 2026 · 5.7K Views

@boazbaraktcs @TheRealAdamG agreed

Boaz BarakBoaz Barak@boazbaraktcs

@TheRealAdamG @TheRealAdamG I love you but let's not fall into this trap - we should brag about how amazing GPT 5.5 is in codex for normal users, and not about its hacking capabilities.

10:06 PM · Apr 30, 2026 · 1.2K Views
3:28 AM · May 1, 2026 · 241 Views

Great explanation of where we are with cyber capabilities right now and what that precisely means

David SacksDavid Sacks@DavidSacks

It’s time to demystify Mythos. Mythos is not magic. It’s not a doomsday device. It’s the first of many models that can automate cyber tasks (just like coding). OpenAI’s GPT-5.5-cyber can now do the same. And all the frontier models (including those from China) will be there within approximately 6 months. It’s important to recognize that these models do not create vulnerabilities; they discover them. The bugs are already in the code. Using AI to discover and patch them will actually harden these systems. The leap from pre-AI cyber to post-AI cyber means that there will be a big upgrade cycle. After that, however, the market is likely to reach a new equilibrium between AI-powered cyber-offense and AI-powered cyber-defense. Obviously it’s important that cyber defenders get access before cyber attackers. That process is already underway but needs to happen quickly (see point above about Chinese models). Unlike Mythos, GPT-5.5-cyber appears not to be token constrained so it may be the first cyber model that defenders actually get to use.

5:45 PM · Apr 30, 2026 · 991K Views
10:53 PM · Apr 30, 2026 · 5.6K Views

Did Not Buy Anthoripic Psyops Again Award granted to: me Sama did mog Mythos on all dimensions (because 5.5 can be used at all and apparently isn't weaker)

Lisan al GaibLisan al Gaib@scaling01

GPT-5.5 is on par with Claude Mythos - GPT-5.5 average pass rate of 71.4% (±8.0%) - Mythos Preview 68.6% (±8.7%) - GPT-5.5 solved a task that takes a human expert ~12 hours in under 11 minutes at a cost of $1.73

3:17 PM · Apr 30, 2026 · 413.6K Views
7:47 PM · Apr 30, 2026 · 5K Views

@DavidSacks Ya

David SacksDavid Sacks@DavidSacks

It’s time to demystify Mythos. Mythos is not magic. It’s not a doomsday device. It’s the first of many models that can automate cyber tasks (just like coding). OpenAI’s GPT-5.5-cyber can now do the same. And all the frontier models (including those from China) will be there within approximately 6 months. It’s important to recognize that these models do not create vulnerabilities; they discover them. The bugs are already in the code. Using AI to discover and patch them will actually harden these systems. The leap from pre-AI cyber to post-AI cyber means that there will be a big upgrade cycle. After that, however, the market is likely to reach a new equilibrium between AI-powered cyber-offense and AI-powered cyber-defense. Obviously it’s important that cyber defenders get access before cyber attackers. That process is already underway but needs to happen quickly (see point above about Chinese models). Unlike Mythos, GPT-5.5-cyber appears not to be token constrained so it may be the first cyber model that defenders actually get to use.

5:45 PM · Apr 30, 2026 · 991K Views
9:28 PM · Apr 30, 2026 · 2.1K Views

@markchen90 ship cbyer plz

Mark ChenMark Chen@markchen90

This is just one eval, but it's an important one - UK AISI’s cyber range tests long-horizon, agentic capability. 5.5 performs similarly to Mythos. The risks for frontier models are real. But we do our best to deploy AI people can actually use - through hard work on mitigations.

12:38 AM · May 1, 2026 · 16.9K Views
1:02 AM · May 1, 2026 · 738 Views

what noam is saying here, by the way, is that we've entered RSI. You can scale inference compute to discover new knowledge, which you can then use to create new data to train on. It only doesn't feel like a foom to you because you're a human, whose lifetime is a blink

Noam BrownNoam Brown@polynoamial

After 100 million tokens, performance was still going up. What we're seeing here is not the capability ceiling. From the report: "Performance on TLO continues to scale with the amount of inference compute spent, and we have not yet observed a plateau with the best models."

4:07 PM · Apr 30, 2026 · 167.7K Views
9:34 PM · Apr 30, 2026 · 93K Views

Progressive Adversarial overload is how we harden all complex systems, from cybersecurity to biosecurity and engineer anti-fragility.

This progressive release is the way, lets the system adiabatically adapt as the overall level of intelligence of adversaries goes up.

David SacksDavid Sacks@DavidSacks

It’s time to demystify Mythos. Mythos is not magic. It’s not a doomsday device. It’s the first of many models that can automate cyber tasks (just like coding). OpenAI’s GPT-5.5-cyber can now do the same. And all the frontier models (including those from China) will be there within approximately 6 months. It’s important to recognize that these models do not create vulnerabilities; they discover them. The bugs are already in the code. Using AI to discover and patch them will actually harden these systems. The leap from pre-AI cyber to post-AI cyber means that there will be a big upgrade cycle. After that, however, the market is likely to reach a new equilibrium between AI-powered cyber-offense and AI-powered cyber-defense. Obviously it’s important that cyber defenders get access before cyber attackers. That process is already underway but needs to happen quickly (see point above about Chinese models). Unlike Mythos, GPT-5.5-cyber appears not to be token constrained so it may be the first cyber model that defenders actually get to use.

5:45 PM · Apr 30, 2026 · 991K Views
8:00 PM · Apr 30, 2026 · 6.2K Views

Okay, since people seem to be not understanding the distinction here, I'll spell it out. They are not the same.

Mythos can, on its own, discover lots of new vulnerabilities, because it is capable of navigating and exploring on its own and stringing these things together. It doesn't need to be told exactly what to do, it can figure out what to do.

GPT-5.5 is at least as good as Mythos on 'narrow cyber tasks' as per UK AISI, but they have to be narrow. You need to know what it is you want done. That's valuable, but it's not at all the same thing, and far less dangerous.

If OpenAI could have compiled and fixed a similar stream of bugs in the world's most important software, at similar compute cost, I presume that they would have.

Indeed, GPT-5.5-Cyber exists, and yet the White House is objecting to Anthropic expanding deployment of Mythos. You think they're doing this for no reason?

Meanwhile, the whole 'everyone will have it in six months' is the usual pretending that the situation is much closer than it is, although of course on a long enough time horizon the point stands.

David SacksDavid Sacks@DavidSacks

It’s time to demystify Mythos. Mythos is not magic. It’s not a doomsday device. It’s the first of many models that can automate cyber tasks (just like coding). OpenAI’s GPT-5.5-cyber can now do the same. And all the frontier models (including those from China) will be there within approximately 6 months. It’s important to recognize that these models do not create vulnerabilities; they discover them. The bugs are already in the code. Using AI to discover and patch them will actually harden these systems. The leap from pre-AI cyber to post-AI cyber means that there will be a big upgrade cycle. After that, however, the market is likely to reach a new equilibrium between AI-powered cyber-offense and AI-powered cyber-defense. Obviously it’s important that cyber defenders get access before cyber attackers. That process is already underway but needs to happen quickly (see point above about Chinese models). Unlike Mythos, GPT-5.5-cyber appears not to be token constrained so it may be the first cyber model that defenders actually get to use.

5:45 PM · Apr 30, 2026 · 991K Views
8:43 PM · Apr 30, 2026 · 49K Views

@boazbaraktcs Fair feedback @boazbaraktcs!

Boaz BarakBoaz Barak@boazbaraktcs

@TheRealAdamG @TheRealAdamG I love you but let's not fall into this trap - we should brag about how amazing GPT 5.5 is in codex for normal users, and not about its hacking capabilities.

10:06 PM · Apr 30, 2026 · 1.2K Views
10:11 PM · Apr 30, 2026 · 262 Views

GPT-5.5 on par with Claude Mythos on mutli-step cyber-attack simulations?

OpenAI: come back of the year.

AI Security InstituteAI Security Institute@AISecurityInst

OpenAI’s GPT-5.5 is the second model to complete one of our multi-step cyber-attack simulations end-to-end 🧵

3:07 PM · Apr 30, 2026 · 1.4M Views
6:42 PM · Apr 30, 2026 · 17.2K Views

Our evaluation of OpenAI's GPT-5.5 cyber capabilities https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5-5-cyber-capabilities GPT-5.5 is one of the strongest models we have tested on our cyber tasks and is the second model to solve one of our multi-step cyber-attack simulations end-to-end. // of course it is and those that fell for Mythos being some sort of unfathomable cyber-weapon feel for PR hook, line, sinker.

9:00 PM · Apr 30, 2026 · 19.2K Views
UK AI Security Institute reports OpenAI GPT-5.5 completes 32-step cyber-attack simulation · Digg