Cybersecurity Researchers Criticize Anthropic's Fable Guardrails

Original: Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

Why This Matters

Highlights tension between AI safety measures and practical cybersecurity applications

Anthropic's new AI model Fable, a limited version of cybersecurity model Mythos, faces criticism from security researchers for overly restrictive guardrails that block basic cybersecurity-related tasks and even code reviews.

Anthropic released Fable on Tuesday as a public version of its cybersecurity model Mythos, but security professionals are criticizing its restrictive guardrails. IBM X-Force researcher Valentina Palmiotti noted that Fable 'rejects any request that could be tangentially cyber related,' including reading blog posts. The model flags messages for 'cybersecurity or biology topics' and falls back to Claude Opus 4.8 when triggered. Cybersecurity veteran Matt Suiche said the guardrails appear keyword-based, blocking even requests to write secure code. Anthropic originally restricted Mythos to limited companies through Project Glasswing in April, expanding to hundreds of organizations in 15 countries last week. The company requires cybersecurity professionals to apply to its Cyber Verification Program for fewer limitations.

Source

techcrunch.com — Read original →