White House Demands Anthropic Block All AI Jailbreaks
Original: The White House Wants Anthropic to Block All Jailbreaks. That May Not Be Possible
Why This Matters
Reveals fundamental tension between government security mandates and technical limitations in AI safety, affecting regulatory approach to frontier models.
Trump administration officials told WIRED that Anthropic must ensure Claude Fable 5's guardrails cannot be circumvented if the company wants to rerelease the model. Security experts say complete jailbreak prevention is technically impossible.
The Trump administration has escalated its dispute with Anthropic over the Claude Fable 5 AI model, which was taken offline last week due to jailbreaking concerns. Trump officials informed the company that to rerelease Fable 5, it must address what the government views as security vulnerabilities in the model's safeguards. The National Security Agency has concluded that methods exist to disable guardrails designed to prevent access to capabilities related to cybersecurity, chemistry, and biology. Anthropic has maintained that administration concerns are overblown and jailbreak effects are minimal, restating this position to the Commerce Department and Office of the National Cyber Director during a technical meeting on Monday. However, government officials indicate they have moved past debating jailbreak significance. The administration now views this as Anthropic's responsibility to solve, with neither the Commerce Department's Center for AI Standards and Innovation nor the NSA having sufficient resources to monitor every potential jailbreak across all models. Officials believe Anthropic should proactively test all frontier AI models to identify potential jailbreaks and report findings to the government. Independent cybersecurity experts increasingly argue that AI guardrails are temporary solutions, as skilled users and advanced AI systems will eventually find circumvention methods—suggesting the White House's demand may be technically unfeasible.