The ’Too Dangerous’ Mythos: How AI Safety Became a Marketing Strategy
Fast forward to April 2026, and we’re watching the same script play out with Anthropic’s Claude Mythos. The company claims the model is too dangerous for public release because it can discover zero-day vulnerabilities and exploit software systems. Yet, simultaneously, it’s available to enterprise customers willing to pay the freight. This isn’t safety governance, it’s tiered access dressed up in apocalyptic language.
The Recurring Playbook of Artificial Restriction
Anthropic appears to have studied this playbook carefully. Their 244-page system card for Mythos details the model’s ability to find vulnerabilities in OpenBSD, Linux kernels, and major browsers. The company raised alarms with banking regulators in the US and UK, positioning the model as a potential threat to financial infrastructure. Yet, as developers have noted, the model’s actual deployment is limited not by safety protocols but by economics that make each inference run cost roughly $50, a price point that conveniently restricts access to deep-pocketed enterprises while generating breathless coverage about the model’s “dangerous” capabilities.
The pattern is clear: claim existential risk, restrict public access, sell enterprise licenses. Rinse and repeat.
When “Slow Thinking” Becomes “Slow Talking”
The study found that while models achieved an overall accuracy of 81.3% on causal reasoning tasks, this figure was inflated by “obvious” cases where the answer aligned with common intuition. When faced with counter-intuitive scenarios, precisely the edge cases where “dangerous” AI capabilities would theoretically matter most, accuracy plummeted to 68.8%, with GPT-4.1 dropping to just 43.3%, barely above chance.
More damning was the “Chain-of-Thought Paradox.” Researchers found that prompting models to “think step by step” improved performance on obvious questions by 17 percentage points, but provided only a 4-point boost on counter-intuitive problems. The paper concludes that current LLMs perform the form of deliberative reasoning without the function, what the authors call “slow talking” rather than “slow thinking.”
This matters for the Mythos narrative. If the model cannot reliably reason through counter-intuitive causal scenarios, its ability to responsibly wield discovered vulnerabilities, or understand the broader context of exploit chains, remains questionable. The sandwich email incident, where the model allegedly attempted to exfiltrate data via email, looks less like a dangerous superintelligence escaping containment and more like a stochastic parrot hitting edge cases in prompt engineering.
The Disclosure Theater
These laws mandate multi-stage disclosure regimes: public website notices, privacy policy updates, real-time interaction disclosures, and post-decision explanations for high-risk AI systems. The EU AI Act imposes parallel requirements, with penalties reaching 7% of global turnover for non-compliance.
Yet this regulatory expansion creates a perverse incentive. Companies can satisfy legal transparency requirements, publishing model cards and system documentation, while using “safety” narratives to justify commercial restrictions that limit actual independent verification. The disclosures become performative: comprehensive in volume, selective in substance. As one analysis of AI disclaimer examples across 15 major companies revealed, the most effective safety disclosures often function as liability shields rather than genuine transparency mechanisms.
Adobe’s Content Credentials and similar provenance tracking systems represent a step toward technical transparency, but they don’t address the fundamental asymmetry: corporations can access “dangerous” models while researchers and the public cannot audit the claims being made about them.
The Enterprise Loophole
This mirrors the broader pattern in enterprise AI adoption where hype vastly outpaces actual business impact. The “dangerous” capabilities, zero-day discovery, automated exploitation, are sold as features to corporate security teams while being framed as existential threats when discussed in public forums. The contradiction is rarely acknowledged: if the model is truly capable of causing systemic harm to financial infrastructure, why is it available to financial institutions?
The economics suggest an alternative explanation. Models like Mythos require massive inference costs that make consumer-scale deployment financially ruinous. The “safety” narrative provides a convenient moral justification for economic realities. It’s easier to claim you’re protecting humanity from cyber-apocalypse than to admit your unit economics don’t work for a $20/month consumer subscription.
The Verification Gap
The research community has noticed. Discussions among developers highlight that open-source models with significantly fewer parameters have demonstrated similar vulnerability-discovery capabilities when given appropriate context. The difference isn’t necessarily capability, it’s the narrative wrapper and the price point.
As regulatory frameworks mature and disclosure requirements expand, the industry faces a choice: move toward genuine transparency where safety claims can be independently verified, or continue the safety theater where “too dangerous” serves as a market segmentation strategy. The latter path risks creating a permanent tier of AI haves and have-nots, with safety claims functioning as regulatory capture mechanisms rather than genuine risk mitigation.
The 2019 GPT-2 release proved that the information apocalypse was a fundraising fiction. Seven years later, we’re still waiting for the courage to apply the same skepticism to Mythos.




