Anthropic apologizes for invisible Claude Fable guardrails

Posted by rarisma 10 hours ago

Anthropic apologizes for invisible Claude Fable guardrails(www.theverge.com)

https://web.archive.org/web/20260611122253/https://www.theve..., https://archive.ph/y4V4k

222 points | 250 commentspage 4

tornikeo 5 hours ago|

I moved off Claude Code 3 months ago.

That decision keeps getting better and better as time goes on.

hatthew 4 hours ago||

Part of the premise of the article is blatantly wrong. Distillation prevention was always visible. The only invisible safeguard was against frontier model development like development of training pipelines. This doesn't change the general idea that invisible degradation is bad and has been reverted, but the article changes the framing of the original issue from "preventing accelerating AI in the future" to "preventing cheaper AI right now".

whatever1 6 hours ago||

Boobytrapping is illegal. Anthropic wanted to poison its customers on the suspicion of them misusing their services.

umvi 4 hours ago||

They make great models, but the sanctimony and paternalism is getting old real fast and I will gladly ditch them in the future when the model playing field has (hopefully) mostly equalized.

ChrisArchitect 2 hours ago||

[dupe] We already started a thread on this 12 hours ago. With added comments in the active Cybersecurity... thread. Why did we need this Verge one?

https://news.ycombinator.com/item?id=48485958

nrmitchi 2 hours ago||

I just _know_ there is a (probably fairly large) group of people at Anthropic trying very hard to not say "I told you so" today

prodigycorp 6 hours ago||

Anthropic apologizes for nothing. We all know where the EA cult on things of this matter and any statements otherwise is just PR.

The beliefs of these people, and how they manifest, is deeply terrifying to me. They believe that any means are acceptable to achieve what they believe is a better end.

3fffa 5 hours ago||

The demand for Google's products and open source just shifted.

Neither OAI or Anthropic can be trusted.

behnamoh 6 hours ago||

They didn't apologize for doing it, they are sorry they were caught doing it. They still nerf the model if your request is about AI development.

Someone1234 6 hours ago|

They didn't get "caught." It was published, by them, when they released Fable a few days ago. They were very clear about it.

It wasn't the correct way of handling the problem they were trying to address, but they definitely didn't hide it by any reasonable definition.

SilverElfin 6 hours ago||

No, it was not clear. No one expects that a tool they pay for and use professionally to purposefully sabotage their work. You’re excusing their unhinged behavior.

https://xcancel.com/hammer_mt/status/2064839924398825798

whimsicalism 5 hours ago|||

Excusing? Their comment is factually correct and the parent is factually wrong.

ryandrake 6 hours ago|||

Making excuses for billion+ dollar companies' behavior is one of the most common HN comment section pastimes.

joxdosba 5 hours ago|||

Only second to making intellectually dishonest criticisms of perceived behaviours

behnamoh 5 hours ago|||

I think your comment refers to @Someone1234.

ryandrake 5 hours ago||

It's a very generalized observation. I sometimes think of the HN comment section as the Billionaire's Defense League.

ben_w 41 minutes ago||

Hardly unique to us, but mostly fair.

(Only "mostly" because if you're here at the right time of day, can also see support for actual communism).

rodrigodlu 5 hours ago|

The same week that they will move goalposts by blocking 3rd party harnesses on claude code. Nice.

I was a happy Max user.

More comments...