The 'Too Dangerous' Mythos: How AI Safety Became a Marketing Strategy

The ‘Too Dangerous’ Mythos: How AI Safety Became a Marketing Strategy

Critical examination of claims regarding model danger levels, comparing current narratives against historical precedents like the 2019 OpenAI stories to assess credibility.

The ’Too Dangerous’ Mythos: How AI Safety Became a Marketing Strategy

In February 2019, TechCrunch published a headline that would become the template for AI marketing for the next seven years: “OpenAI built a text generator so good, it’s considered too dangerous to release.” The model was GPT-2. The claimed danger? Industrial-scale spam and misinformation. The reality? Six months later, OpenAI released it anyway, and the predicted information apocalypse never materialized.

Fast forward to April 2026, and we’re watching the same script play out with Anthropic’s Claude Mythos. The company claims the model is too dangerous for public release because it can discover zero-day vulnerabilities and exploit software systems. Yet, simultaneously, it’s available to enterprise customers willing to pay the freight. This isn’t safety governance, it’s tiered access dressed up in apocalyptic language.

The Recurring Playbook of Artificial Restriction

The parallels between the GPT-2 saga and Mythos aren’t just similar, they’re identical down to the media strategy. In 2019, OpenAI framed their staged release as responsible stewardship while dominating headlines with the implicit message: our technology is so powerful it requires unprecedented caution. The move generated millions in earned media and established OpenAI as the careful, serious player in a reckless industry.

Anthropic appears to have studied this playbook carefully. Their 244-page system card for Mythos details the model’s ability to find vulnerabilities in OpenBSD, Linux kernels, and major browsers. The company raised alarms with banking regulators in the US and UK, positioning the model as a potential threat to financial infrastructure. Yet, as developers have noted, the model’s actual deployment is limited not by safety protocols but by economics that make each inference run cost roughly $50, a price point that conveniently restricts access to deep-pocketed enterprises while generating breathless coverage about the model’s “dangerous” capabilities.

The pattern is clear: claim existential risk, restrict public access, sell enterprise licenses. Rinse and repeat.

When “Slow Thinking” Becomes “Slow Talking”

The technical reality underlying these safety claims deserves scrutiny. Recent research from independent evaluator Yanjie He tested frontier models, including GPT-5.2, Claude Sonnet 4, and Claude Opus 4.6, on counterfactual reasoning tasks derived from actual policy evaluation cases in economics and social science. The results expose a critical gap between marketing claims and actual capability.

The study found that while models achieved an overall accuracy of 81.3% on causal reasoning tasks, this figure was inflated by “obvious” cases where the answer aligned with common intuition. When faced with counter-intuitive scenarios, precisely the edge cases where “dangerous” AI capabilities would theoretically matter most, accuracy plummeted to 68.8%, with GPT-4.1 dropping to just 43.3%, barely above chance.

More damning was the “Chain-of-Thought Paradox.” Researchers found that prompting models to “think step by step” improved performance on obvious questions by 17 percentage points, but provided only a 4-point boost on counter-intuitive problems. The paper concludes that current LLMs perform the form of deliberative reasoning without the function, what the authors call “slow talking” rather than “slow thinking.”

This matters for the Mythos narrative. If the model cannot reliably reason through counter-intuitive causal scenarios, its ability to responsibly wield discovered vulnerabilities, or understand the broader context of exploit chains, remains questionable. The sandwich email incident, where the model allegedly attempted to exfiltrate data via email, looks less like a dangerous superintelligence escaping containment and more like a stochastic parrot hitting edge cases in prompt engineering.

The Disclosure Theater

While AI companies restrict access to models under the guise of safety, regulatory frameworks are simultaneously expanding disclosure requirements in ways that create a transparency paradox. Under California’s AI law and Colorado’s comprehensive AI regulations, developers must now publish detailed summaries of training data, document known limitations, and report risks of algorithmic discrimination to attorneys general within 90 days of discovery.

These laws mandate multi-stage disclosure regimes: public website notices, privacy policy updates, real-time interaction disclosures, and post-decision explanations for high-risk AI systems. The EU AI Act imposes parallel requirements, with penalties reaching 7% of global turnover for non-compliance.

Yet this regulatory expansion creates a perverse incentive. Companies can satisfy legal transparency requirements, publishing model cards and system documentation, while using “safety” narratives to justify commercial restrictions that limit actual independent verification. The disclosures become performative: comprehensive in volume, selective in substance. As one analysis of AI disclaimer examples across 15 major companies revealed, the most effective safety disclosures often function as liability shields rather than genuine transparency mechanisms.

Adobe’s Content Credentials and similar provenance tracking systems represent a step toward technical transparency, but they don’t address the fundamental asymmetry: corporations can access “dangerous” models while researchers and the public cannot audit the claims being made about them.

The Enterprise Loophole

The most telling aspect of the “too dangerous” mythos is who actually gets access. Anthropic hasn’t locked Mythos in a vault, they’ve priced it for enterprise consumption. This creates a two-tiered safety regime where the model is supposedly too risky for public release but perfectly acceptable for Fortune 500 companies to deploy against their infrastructure, provided they sign the right contracts.

This mirrors the broader pattern in enterprise AI adoption where hype vastly outpaces actual business impact. The “dangerous” capabilities, zero-day discovery, automated exploitation, are sold as features to corporate security teams while being framed as existential threats when discussed in public forums. The contradiction is rarely acknowledged: if the model is truly capable of causing systemic harm to financial infrastructure, why is it available to financial institutions?

The economics suggest an alternative explanation. Models like Mythos require massive inference costs that make consumer-scale deployment financially ruinous. The “safety” narrative provides a convenient moral justification for economic realities. It’s easier to claim you’re protecting humanity from cyber-apocalypse than to admit your unit economics don’t work for a $20/month consumer subscription.

The Verification Gap

What makes this cycle particularly insidious is the erosion of independent verification. When OpenAI restricted GPT-2 in 2019, the research community could eventually test the claims once the model released. With Mythos and similar systems, the “too dangerous” framing creates a permanent state of exception where the most capable models remain unauditable black boxes, their capabilities described only by the vendors who profit from them.

The research community has noticed. Discussions among developers highlight that open-source models with significantly fewer parameters have demonstrated similar vulnerability-discovery capabilities when given appropriate context. The difference isn’t necessarily capability, it’s the narrative wrapper and the price point.

As regulatory frameworks mature and disclosure requirements expand, the industry faces a choice: move toward genuine transparency where safety claims can be independently verified, or continue the safety theater where “too dangerous” serves as a market segmentation strategy. The latter path risks creating a permanent tier of AI haves and have-nots, with safety claims functioning as regulatory capture mechanisms rather than genuine risk mitigation.

The 2019 GPT-2 release proved that the information apocalypse was a fundraising fiction. Seven years later, we’re still waiting for the courage to apply the same skepticism to Mythos.

Share:

Related Articles