Anthropic apologizes for invisible Claude Fable guardrails

Posted by rarisma 7 hours ago

Anthropic apologizes for invisible Claude Fable guardrails(www.theverge.com)

https://web.archive.org/web/20260611122253/https://www.theve..., https://archive.ph/y4V4k

171 points | 175 commentspage 2

CSMastermind 1 hour ago|

They should apologize for their visible gaurdrails, I don't think I've had a conversation that hasn't downgraded to Opus for completely inexplicable reasons.

accelbred 2 hours ago||

I don't think they can convince me they have actually reversed course on this. Its invisible so we wouldn't know if they kept on doing it secretly. It required building out technical capability which is unlikely to remain forever unused while conveniently available to them.

They relied on trust that they were providing the service they were being paid for. That trust was blown, and an "oops, lets undo that" does not regain trust. It would be prudent to assume the invisible guardraild are possibly in play for all future Clause use, Fable or otherwise.

stevefan1999 2 hours ago||

Then reset the quotas as an atonement ;p

Seriously though, Fable was not that great facing a greenfield subject. It is excellent at oneshotting some math problems, but if you want it to do some cutting edge tech stuff, say like piecing together a new Crossplane XRD, by reading existing Helm chart and with application source code available. I still have to get a few pass for Fable to get it done right, and at this point I may consider making a skill for it. I even gave it the source code of the Crossplane itself and tell it to be careful about CRDs and data flow, but it is still pretty silly. Adaptiveness for Fable is still not great, and I think it is a well known problem for Anthropic, albeit all LLMs do suffer a lot from subjects they don't know and will hallucinate stuff very frequently.

umvi 1 hour ago||

They make great models, but the sanctimony and paternalism is getting old real fast and I will gladly ditch them in the future when the model playing field has (hopefully) mostly equalized.

rdtsc 1 hour ago||

The power is getting to their heads it seems.

With the guard rails explicit or implicit do they refund back the tokens after you've hit the guard rails? I guess they don't. They could just throttle you just to save money then. You may be paying Fable prices but getting Haiku results with some excuse that well this coding issue sounds like a security bug.

I don't know, I'd rather have something less powerful but more predictable.

hatthew 1 hour ago||

Part of the premise of the article is blatantly wrong. Distillation prevention was always visible. The only invisible safeguard was against frontier model development like development of training pipelines. This doesn't change the general idea that invisible degradation is bad and has been reverted, but the article changes the framing of the original issue from "preventing accelerating AI in the future" to "preventing cheaper AI right now".

highfrequency 1 hour ago||

I wish it were ok for companies to bluntly say: “we made these decisions for competitive reasons, but the public backlash outweighed that so we are reversing course.”

I think it’s normal and morally fine for companies to want to protect their leadership position. I find the process of creating narratives that justify these decisions as something chosen for the good of others is a little tedious.

jarjoura 2 hours ago|

Can anyone help me understand why this particular issue is any different than Anthropic training its models with its brand of moral judgement since day one? I've always been turned off by their particular stances on things they bake into their models that steer users in directions.

Maybe this is just a different set of people now realizing that Anthropic does this and has always done this?

Do not forget that this company is launching this thing at the moment it's trying to IPO. It's not rocket science that their very public steering/denial claim is really just them hinting to interested investors that their moat is absolute.

urbnspacecowboy 5 minutes ago|

> Can anyone help me understand why this particular issue is any different than...

Questions like this are basically whataboutism, in effect even if not intent. https://en.wikipedia.org/wiki/Whataboutism

The question essentially assumes the premise that nobody complained about Anthropic's previous actions. In case you can't tell, I strongly reject this premise. People have been criticizing "safety" rhetoric from Anthropic and other LLM providers practically since the start. Remember Goody-2, the parody of excessively safety-tuned LLMs that refuses to do anything ever? That was released in February 2024, two years ago! (And it's still running, amazing. https://www.goody2.ai/chat )

More comments...