UK AI Security Institute reports OpenAI GPT-5.5 completes 32-step cyber-attack simulation
OpenAI’s GPT-5.5 fully completed a multi-step cyber-attack simulation in testing by the UK AI Security Institute, marking the second model to achieve this on the AISI cyber-range evaluation with a 71.4% average pass rate. It edged out Anthropic’s Claude Mythos Preview at 68.6%. The benchmark tests long-horizon agentic capabilities for autonomous cyber operations, including vulnerability discovery and simulated attacks. One task equated to 12 hours of human expert effort.
This is just one eval, but it's an important one - UK AISI’s cyber range tests long-horizon, agentic capability. 5.5 performs similarly to Mythos.
The risks for frontier models are real. But we do our best to deploy AI people can actually use - through hard work on mitigations.
OpenAI’s GPT-5.5 is the second model to complete one of our multi-step cyber-attack simulations end-to-end 🧵
Co-sign.
It’s time to demystify Mythos. Mythos is not magic. It’s not a doomsday device. It’s the first of many models that can automate cyber tasks (just like coding). OpenAI’s GPT-5.5-cyber can now do the same. And all the frontier models (including those from China) will be there within approximately 6 months. It’s important to recognize that these models do not create vulnerabilities; they discover them. The bugs are already in the code. Using AI to discover and patch them will actually harden these systems. The leap from pre-AI cyber to post-AI cyber means that there will be a big upgrade cycle. After that, however, the market is likely to reach a new equilibrium between AI-powered cyber-offense and AI-powered cyber-defense. Obviously it’s important that cyber defenders get access before cyber attackers. That process is already underway but needs to happen quickly (see point above about Chinese models). Unlike Mythos, GPT-5.5-cyber appears not to be token constrained so it may be the first cyber model that defenders actually get to use.
@TheRealAdamG @TheRealAdamG I love you but let's not fall into this trap - we should brag about how amazing GPT 5.5 is in codex for normal users, and not about its hacking capabilities.
https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5-5-cyber-capabilities "In April, our evaluation of Anthropic's Claude Mythos Preview found that it represented a step up in cyber performance over previous frontier models and was the first to complete our corporate network attack simulation end-to-end, a multi-step exercise we estimate would take a human around 20 hours. A key question was whether this reflected a breakthrough specific to one model, or part of a broader trend. Results from an early checkpoint of GPT-5.5 suggest the latter: a second model, from a different developer, now reaches a similar level of performance on our cyber evaluations."
TBH I find it very weird to "compete" on dual use or risky capabilities. We need to measure these to choose appropriate safeguards, but shouldn't optimize or market them.
GPT 5.5 is a great model not because it can find vulnerabilities but because it can deliver value to users.
GPT-5.5 is on par with Claude Mythos - GPT-5.5 average pass rate of 71.4% (±8.0%) - Mythos Preview 68.6% (±8.7%) - GPT-5.5 solved a task that takes a human expert ~12 hours in under 11 minutes at a cost of $1.73
@boazbaraktcs @TheRealAdamG agreed
@TheRealAdamG @TheRealAdamG I love you but let's not fall into this trap - we should brag about how amazing GPT 5.5 is in codex for normal users, and not about its hacking capabilities.
Great explanation of where we are with cyber capabilities right now and what that precisely means
It’s time to demystify Mythos. Mythos is not magic. It’s not a doomsday device. It’s the first of many models that can automate cyber tasks (just like coding). OpenAI’s GPT-5.5-cyber can now do the same. And all the frontier models (including those from China) will be there within approximately 6 months. It’s important to recognize that these models do not create vulnerabilities; they discover them. The bugs are already in the code. Using AI to discover and patch them will actually harden these systems. The leap from pre-AI cyber to post-AI cyber means that there will be a big upgrade cycle. After that, however, the market is likely to reach a new equilibrium between AI-powered cyber-offense and AI-powered cyber-defense. Obviously it’s important that cyber defenders get access before cyber attackers. That process is already underway but needs to happen quickly (see point above about Chinese models). Unlike Mythos, GPT-5.5-cyber appears not to be token constrained so it may be the first cyber model that defenders actually get to use.
Did Not Buy Anthoripic Psyops Again Award granted to: me Sama did mog Mythos on all dimensions (because 5.5 can be used at all and apparently isn't weaker)

GPT-5.5 is on par with Claude Mythos - GPT-5.5 average pass rate of 71.4% (±8.0%) - Mythos Preview 68.6% (±8.7%) - GPT-5.5 solved a task that takes a human expert ~12 hours in under 11 minutes at a cost of $1.73
@DavidSacks Ya
It’s time to demystify Mythos. Mythos is not magic. It’s not a doomsday device. It’s the first of many models that can automate cyber tasks (just like coding). OpenAI’s GPT-5.5-cyber can now do the same. And all the frontier models (including those from China) will be there within approximately 6 months. It’s important to recognize that these models do not create vulnerabilities; they discover them. The bugs are already in the code. Using AI to discover and patch them will actually harden these systems. The leap from pre-AI cyber to post-AI cyber means that there will be a big upgrade cycle. After that, however, the market is likely to reach a new equilibrium between AI-powered cyber-offense and AI-powered cyber-defense. Obviously it’s important that cyber defenders get access before cyber attackers. That process is already underway but needs to happen quickly (see point above about Chinese models). Unlike Mythos, GPT-5.5-cyber appears not to be token constrained so it may be the first cyber model that defenders actually get to use.
@markchen90 ship cbyer plz
This is just one eval, but it's an important one - UK AISI’s cyber range tests long-horizon, agentic capability. 5.5 performs similarly to Mythos. The risks for frontier models are real. But we do our best to deploy AI people can actually use - through hard work on mitigations.
what noam is saying here, by the way, is that we've entered RSI. You can scale inference compute to discover new knowledge, which you can then use to create new data to train on. It only doesn't feel like a foom to you because you're a human, whose lifetime is a blink
After 100 million tokens, performance was still going up. What we're seeing here is not the capability ceiling. From the report: "Performance on TLO continues to scale with the amount of inference compute spent, and we have not yet observed a plateau with the best models."
Progressive Adversarial overload is how we harden all complex systems, from cybersecurity to biosecurity and engineer anti-fragility.
This progressive release is the way, lets the system adiabatically adapt as the overall level of intelligence of adversaries goes up.
It’s time to demystify Mythos. Mythos is not magic. It’s not a doomsday device. It’s the first of many models that can automate cyber tasks (just like coding). OpenAI’s GPT-5.5-cyber can now do the same. And all the frontier models (including those from China) will be there within approximately 6 months. It’s important to recognize that these models do not create vulnerabilities; they discover them. The bugs are already in the code. Using AI to discover and patch them will actually harden these systems. The leap from pre-AI cyber to post-AI cyber means that there will be a big upgrade cycle. After that, however, the market is likely to reach a new equilibrium between AI-powered cyber-offense and AI-powered cyber-defense. Obviously it’s important that cyber defenders get access before cyber attackers. That process is already underway but needs to happen quickly (see point above about Chinese models). Unlike Mythos, GPT-5.5-cyber appears not to be token constrained so it may be the first cyber model that defenders actually get to use.
Okay, since people seem to be not understanding the distinction here, I'll spell it out. They are not the same.
Mythos can, on its own, discover lots of new vulnerabilities, because it is capable of navigating and exploring on its own and stringing these things together. It doesn't need to be told exactly what to do, it can figure out what to do.
GPT-5.5 is at least as good as Mythos on 'narrow cyber tasks' as per UK AISI, but they have to be narrow. You need to know what it is you want done. That's valuable, but it's not at all the same thing, and far less dangerous.
If OpenAI could have compiled and fixed a similar stream of bugs in the world's most important software, at similar compute cost, I presume that they would have.
Indeed, GPT-5.5-Cyber exists, and yet the White House is objecting to Anthropic expanding deployment of Mythos. You think they're doing this for no reason?
Meanwhile, the whole 'everyone will have it in six months' is the usual pretending that the situation is much closer than it is, although of course on a long enough time horizon the point stands.
It’s time to demystify Mythos. Mythos is not magic. It’s not a doomsday device. It’s the first of many models that can automate cyber tasks (just like coding). OpenAI’s GPT-5.5-cyber can now do the same. And all the frontier models (including those from China) will be there within approximately 6 months. It’s important to recognize that these models do not create vulnerabilities; they discover them. The bugs are already in the code. Using AI to discover and patch them will actually harden these systems. The leap from pre-AI cyber to post-AI cyber means that there will be a big upgrade cycle. After that, however, the market is likely to reach a new equilibrium between AI-powered cyber-offense and AI-powered cyber-defense. Obviously it’s important that cyber defenders get access before cyber attackers. That process is already underway but needs to happen quickly (see point above about Chinese models). Unlike Mythos, GPT-5.5-cyber appears not to be token constrained so it may be the first cyber model that defenders actually get to use.
@boazbaraktcs Fair feedback @boazbaraktcs!
@TheRealAdamG @TheRealAdamG I love you but let's not fall into this trap - we should brag about how amazing GPT 5.5 is in codex for normal users, and not about its hacking capabilities.
GPT-5.5 on par with Claude Mythos on mutli-step cyber-attack simulations?
OpenAI: come back of the year.

OpenAI’s GPT-5.5 is the second model to complete one of our multi-step cyber-attack simulations end-to-end 🧵
Our evaluation of OpenAI's GPT-5.5 cyber capabilities https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5-5-cyber-capabilities GPT-5.5 is one of the strongest models we have tested on our cyber tasks and is the second model to solve one of our multi-step cyber-attack simulations end-to-end. // of course it is and those that fell for Mythos being some sort of unfathomable cyber-weapon feel for PR hook, line, sinker.