Too Powerful for the Public: What Claude Mythos Says About the Future of AI Control

Published 1 month ago• 4 minute read

Zainab Bakare

Too Powerful for the Public: What Claude Mythos Says About the Future of AI Control

In March 2026, someone at Anthropic made a mistake. An internal document describing the company's most advanced AI model was left sitting in a publicly accessible data cache, not leaked by a whistleblower, not published intentionally, just forgotten in a place where anyone could find it.

What it said was brief and alarming: the model was "by far the most powerful AI we've ever developed" and it "poses unprecedented cybersecurity risks." Cybersecurity stocks slumped within hours.

The model was called Claude Mythos, and most people had never heard of it until that moment. As of today, they still cannot use it.

The Leak That Started Everything

Word of its existence first spread in March 2026 and on April 7, Anthropic confirmedwhat the leak had hinted at. Claude Mythos Preview would not see a general release.

Instead, it would be deployed through something called Project Glasswing, a defensive cybersecurity coalition involving Amazon, Apple, Google, Microsoft, Nvidia and several others.

The idea, in theory, is to use Mythos to fix the world's most critical software vulnerabilities before the wrong people develop the same capability and use it to break things instead.

What Mythos Actually Does

What makes Mythos so dangerous?

The model has demonstrated an ability to find security vulnerabilities in software at a scale and depth that human experts simply cannot match.

Anthropic says it has already identified thousands of high-severity flaws across every major operating system and web browser.

Among those was a 27-year-old vulnerability in OpenBSD, an operating system specifically designed to be difficult to crack, that would have allowed an attacker to remotely crash any machine just by connecting to it.

Another flaw, buried in a line of code in audio-video platform FFmpeg, had survived automated testing more than five million times over 16 years without anyone catching it. Mythos caught it.

These have since been patched. However, the implications sit uncomfortably: if Anthropic's own model can do this, so eventually can someone else's model.

The Window Is Already Closing

Head of Anthropic’s Frontier Team, Logan Graham has said other labs could develop similar capabilities within six to eighteen months. The race is not paused because one company decided to be responsible. It has simply created a window.

OpenAI is already pressing against that boundary. Its GPT-5.3-Codex model, released earlier this year, was described by the company as its first to hit "high" on its internal cybersecurity risk classification which means it is capable enough at code reasoning that, in the wrong hands or at scale, it could enable real-world cyberattacks.

OpenAI released the model anyway, to paying users, but with what it called its most comprehensive cybersecurity safety stack to date.

That is a meaningfully different approach from Anthropic's: not withholding, but releasing with guardrails and hoping they hold.

Everyone Is Approaching the Same Threshold

Google's Gemini 3.1 Pro is, by most independent benchmarks, the strongest general-purpose AI model publicly available right now. It leads on reasoning tests and costs significantly less than its rivals.

Google has not flagged Gemini as too dangerous to release, but the company is also investing in safety tooling and participating in Project Glasswing as a partner, which suggests it is not entirely confident about what comes next either.

Then there is xAI's Grok 4.20, which introduced a new multi-agent architecture, and OpenAI's GPT-5.5, internally codenamed "Spud," which had completed pretraining by late March but has not been given a public release date.

DeepSeek V4 is expected in the next quarter from China, likely at a fraction of the Western models' API cost.

Why This Decision Actually Matters

The picture that emerges is not one company heroically restraining dangerous technology while everyone else races ahead. It is an entire industry approaching a threshold at roughly the same time, with different companies making different bets about how much caution is enough.

Some are withholding. Some are releasing with guardrails. Some are releasing and hoping for the best.

Anthropic's Mythos decision matters not because it settles the question of how to handle dangerously capable AI, but because it forces the question into the open.

For a long time, AI safety debates lived mostly in conference papers and policy documents.

What Mythos has done is make the stakes legible to ordinary people: here is a model so good at finding weaknesses in the software that runs the world that its own creators decided not to hand it out.

The question of who gets to control that kind of power and on what terms is no longer theoretical. It is already being answered.