Posted by rarisma 13 hours ago
But also, it isn’t the only huge mistake Anthropic has made in the last 48 hours. Having a sneaky data retention policy, while also giving companies no way to block Fable, is a massive problem. And it is ridiculous that Anthropic has so little respect for its customers. OpenAI should take advantage of this.
They are clear about the reasons for guardrails: prevent their models from doing harm in dual-use contexts including CBRN or by accelerating research in authoritarian-backed AI labs.
What is the critique against that? It seems pretty reasonable to me. You want AI-accelerated biological or radiological experiments running in your neighbors backyard? You want PRC-backed labs to continue to steal Anthropic's models via distillation?
Mitigating the harms of dual-use tech is notoriously difficult and fraught with trade offs. What I would want to see is cautious rollout and quick response, which is EXACTLY what they're doing.
Instead, this thread is full of bad-faith arguments about Anthropic being dishonest, making a "useless" model, or "the power is going to their heads." You can't read Anthropic's System Cards and come away with any of these impressions. Quite the opposite, in fact. They are honest to a fault, acknowledging problems they discovered even when it hurts them.
If your harmless request was downgraded to Opus, you're billed for Opus. They were 100% clear about that. I'd much rather have a Mythos-class model that falls back to Opus 10% of the time than be capped to Opus 100% of the time. If that doesn't work for you, then make a suggestion for something better!
If you are a white-hat security engineer hitting guardrails, I don't think you have standing to complain. I really don't. Their Glasswing program actually got banks and the industrial sector to take action to fix security vulnerabilities. Do you realize how special that is? A huge portion of the economy runs on vulnerable code and has for decades, despite security experts testifying to Congress, begging business leaders, pleading for intervention-- with no results. But suddenly they're all enrolled in a program that will find *and fix* vulnerabilities! White-hat security people should be rejoicing. Instead some of them are throwing rocks. Unbelievable. Shameful.
Meanwhile, society is screaming at the AI labs to be more conscientious about potential harms of AI. Legislatures are passing laws limiting data center construction. There are protests. And you, the HN community, the vanguard of our profession, have the temerity to demand "NO GUARDRAILS!" "HOW DARE YOU TRY TO PROTECT DEMOCRACY!" "MY SOFTWARE PROJECT IS MORE IMPORTANT THAN KEEPING NUKES AWAY FROM THE BAD GUYS!"
Go ahead HN, downvote me. It'd be an honor.
"Distillation involves training less capable models on more advanced ones’ output, and can be used illicitly to acquire powerful capabilities cheaply. The AI startup accused China’s DeepSeek, MiniMax, and Moonshot of generating 'over 16 million exchanges with Claude through approximately 24,000 fraudulent accounts,'"
https://www.semafor.com/article/02/24/2026/anthropic-accuses...
After reading their posts and watching interviews with Dario it's abundantly clear that they view Chinese-lab distillation of US frontier models as a threat to US national security. You can argue with them about whether that is true, but not whether distillation is real.
What accounts for the difference between your attitude that distillation is no big deal, "common practice," yet Anthropic sees as it as a huge threat?
Why not just tell people, "To defend our ability to be competitive in our industry, we ask that you do not use Claude or any of our models to independently perform research on large language models or any of its related architectures or technologies. In order to prevent this violation of the Terms of Service, we have trained Claude Fable to deny any requests or prompts which involve frontier AI research."
It’s an act/theatre/phony today that regulating output makes any difference at all to security.
The LLM vendors should simply say that they make no judgement and that open systems help defenders better defend against attackers, which is true.
Companies do this sort of stuff when they think their customers have no choice. It’s sad Claude so quickly exploited its success to enshittify itself.
"This information is too dangerous for you, so we'll just hold on to it.."
Thanks big brother, super anthropic of you!
The internet of '95 is looking back at us, with tears in its eyes.
And Fable is cracked. Way better than anything, and the biggest improvements are on the scariest subjects.
So given the state of the world at the moment, and the number of software patches we're barely keeping up with... I'm thankful that they're not making it worse.
(Admittedly it was buried pretty deep in that 300+ page PDF, but they did at least disclose it. If they hadn't I imagine it would have taken quite some time for the research community to figure out what was going on.)
And to be clear, this isn't the safeguard where the model is explicitly downgraded to Opus, but rather where the Fable/Mythos model's "effectiveness" is transparently "limited" via "prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT)".
[0]: https://web.archive.org/web/20260609173222/https://www.anthr...
[1]: https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-...
They could have simply told people "we do not permit using Claude models to perform frontier AI research," which is defensible from a policy point of view. This particular usage of their products requires no deception, nor hiding information prevent abuse.
However, instead, they chose for some reason to publicly display a morally poor way to execute a reasonable business decision (preventing abuse, defending your business interests, etc.)