Anthropic's Safety Plea Backfires: Government Pulls the Plug on Its Most Potent AI

The US government has ordered Anthropic to shut down Fable 5 and Mythos 5 over a jailbreak concern and Anthropic is pushing back, warning the move could freeze AI development industry-wide.

Uche Emeka • AI • 1 month ago • 3 minute read •

Key Points

• The U.S. government mandated Anthropic immediately cease access to its advanced AI models, Claude Fable 5 and Claude Mythos 5, globally.

• The directive cited national security concerns over an alleged 'jailbreak' of Fable 5 that exposed its ability to identify software vulnerabilities.

• Anthropic complied with the order but strongly disagreed, arguing that similar capabilities exist in other models and this standard would impede industry progress.

Anthropic's Safety Plea Backfires: Government Pulls the Plug on Its Most Potent AI

The U.S. government has mandated that Anthropic immediately cease access to two of its most advanced AI models, Claude Fable 5 and Claude Mythos 5, citing national security concerns.

Anthropic announced that it has complied with the directive, which was received on Friday at 5:21 pm ET, but expressed strong disagreement with the government's decision.

The order compels the company to disable both models for all users globally, extending beyond the foreign nationals that the government's export control order was ostensibly designed to target, although other Anthropic models remain unaffected.

Claude Mythos 5 is Anthropic's most capable AI model, which was initially previewed in early April and kept under tight restriction due to its exceptional ability to identify security vulnerabilities in software.

Anthropic claimed Mythos could find flaws in every major operating system and web browser it tested. Instead of a broad release, the company launched 'Project Glasswing,' a controlled program that shared Mythos with approximately 50 vetted organizations, including major tech firms like Amazon, Apple, Google, Microsoft and cybersecurity specialist CrowdStrike, for defensive cybersecurity applications.

Claude Fable 5, released just three days prior to the government's order, was developed as Anthropic's response to commercial demands. It was designed as a version of Mythos incorporating guardrails to prevent responses in high-risk areas such as cybersecurity and biology, making it suitable for general public release, according to the company.

Benchmark tests conducted by Vals AI, an AI tech performance tracking company, quickly identified Fable 5 as the most capable AI model available to the public upon its release.

The government's directive is formally an export control action aimed at restricting foreign national access to the models. However, Anthropic's lengthy blog post indicated its understanding that the underlying concern stems from an alleged 'jailbreak' of Fable 5.

Anthropic stated that the government provided only verbal evidence of a 'potential narrow, non-universal jailbreak,' which the company described as prompting the model to analyze a specific codebase and identify software flaws.

Anthropic further argued that this 'level of capability' is already widely available in other publicly accessible models, including OpenAI’s GPT-5.5, and is routinely utilized by cybersecurity professionals for legitimate defensive purposes.

Anthropic's broader defense highlighted that its most robust safeguards are implemented through independent classifier systems that operate separately from the model itself.

This design, they contended, ensures that even if a user manages to prompt Fable to bypass an initial refusal, the fundamental protections against the most dangerous outputs would remain active.

Despite these arguments, the government proceeded with its action, leading to Anthropic's open frustration.

The company publicly stated, 'We disagree that the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people,' warning that applying such a standard across the industry 'would essentially halt all new model deployments for all frontier model providers.'

The situation carries significant implications for Anthropic, which is widely anticipated to pursue an IPO this year and has cultivated a public image as a safety-conscious alternative to its competitors.

Observers have noted the irony that Anthropic's very caution in restricting Mythos, which it promoted as a uniquely dangerous model unsuitable for public release, has seemingly drawn the precise kind of government scrutiny that could disrupt its business prospects.

This development also recalls Sam Altman's comments in April to podcaster Ashlee Vance, where he characterized Anthropic's handling of Mythos as 'fear-based marketing,' likening it to selling a 'bomb shelter' after claiming to have built a 'bomb.'

Altman's observation, though not predicting a shutdown, underscored that when an AI company emphasizes the unique dangers of its technology, the world, including the U.S. government, tends to heed the warning.