Feds freaked over Fable 5 after 'fix this code', not jailbreak, say researchers

Posted by _tk_ 11 hours ago

Feds freaked over Fable 5 after 'fix this code', not jailbreak, say researchers(www.theregister.com)

500 points | 303 commentspage 3

antirez 6 hours ago|

They didn't freaked since the order was to still allow 350 million people using it: there is, in such large population, everything, including single persons very against the country, the government and so forth. If they really freaked they would say "we need to investigate, you have to retire the model". That would be a more defensible POV at least.

merlindru 7 hours ago||

this is basically trying to enforce security-by-obscurity, which is a terrible idea all around. it's just a model. the security issues still exist and are exploitable.

and after staking the economy on AI, you can't really put a cap on intelligence. if models are not allowed to be better than Opus 4.8, then the whole investment structure is about to unravel.

why invest billions and billions into AI if returns are artificially capped?

softwaredoug 6 hours ago|

Especially as inference gets cheaper, open models proliferate, and it all just becomes ubiquitous and commoditized.

You can’t keep this genie in its bottle for long.

benmusch 5 hours ago||

Headline is dumb, the point is that not mentioning security in the prompt is effectively a jailbreak.

The shutdown may be dumb/politically motivated, but this definitely is a jailbreak even if it's a very simple one

LurkandComment 4 hours ago||

If you're a global health benefits platform that relies on an AI model, do you think you're going to choose one that can get shutoff by a country due to something not remotely related to your business? If you're a buyer of that benefits platform, do you factor this into your purchasing now? X every industry.

rock_artist 9 hours ago||

I'm not sure I've understood it correctly.

So, basically the model didn't agree to expose possible vulnerabilities but agree to patch those?

Regardless of the request to take Fable 5 down. Why is requesting the model to show vulnerabilities is being blocked if fixing it not? is it based on the assumption of the intention?

I don't quite get the benefit of limiting it. So if anyone can explain it better it'll be appreciated.

InsideOutSanta 9 hours ago||

> Why is requesting the model to show vulnerabilities is being blocked if fixing it not?

This is how Anthropic describes Fable's behavior:

"When Fable’s classifiers detect a request related to cybersecurity, biology and chemistry, or distillation, the response is automatically handled by Claude Opus 4.8 instead. Users will be informed whenever this occurs."

So if you ask the model to "find security issues in this code base", it's supposed to fall down to Opus 4.8. I guess the "exploit" here is that if you just tell Fable to "fix this code", which is not "a request related to cybersecurity", it will fix security issues (as it should).

So you can then look at the diff and figure out what the vulnerabilities were.

I think this whole thing is a bit weird. It seems to me that we'd be better off if I, as someone who publishes open-source code, could ask Fable to review my code for security issues - even if that also allows attackers to do the same. Better to fix the issues than not know about them.

djeastm 8 hours ago|||

>So you can then look at the diff and figure out what the vulnerabilities were.

It doesn't even take reading or understanding the vulnerabilities at all.

You just ask it to write tests and the tests themselves can be copied and pasted as bonafide exploits.

ithkuil 9 hours ago||||

I wonder if opus 4.8 would also be able to fix the code too

HarHarVeryFunny 3 hours ago|||

It's not even clear if Anthropic care. If they genuinely think the user is trying to do something dangerous, then "OK, sure, but you're going to have to use Opus 4.8 for that" doesn't make a whole lot of sense.

Maybe this is just Anthropic pre-IPO marketing to try to convince people how much better Mythos is than Opus 4.8. There sure seemed to be a lot of shills out on release day talking about how it was a "step change" (exact phrase) in capability.

InsideOutSanta 8 hours ago|||

In my experience, most models are pretty good at finding security vulnerabilities and fixing them. I can run GLM-5.2, Kimi K2.7, or even a Mistral model, and it'll find issues and propose reasonable fixes.

My impression is that Anthropic's point about Mythos is that it is uniquely good at finding vulnerabilities and then using them to create working exploit chains.

zozbot234 8 hours ago||

Exactly. Which is somewhat helpful for cyber defense because it helps prioritize fixes for those bugs that are in fact involved in a viable exploit chain. But it makes sense that one would want to restrict the ability of building those until the vulnerable software has been comprehensively fixed.

There is some meaningful evidence that Fable is fine-tuned or steered away from helping on this very task, which is not something that can be feasibly circumvented by a basic jailbreak.

darkerside 8 hours ago|||

The problem then is that if you're not using Fable/Mythos, you are under threat. It's like having a single gun manufacturer.

On this track, we're probably destined for a monopoly breakup before too long.

freedomben 5 hours ago||

Yeah, this is why the exclusivity approach so far has bugged me so much. As a small business, we are nowhere near powerful enough to get access, so we will be stuck scrambling once it's finally available. Fable felt like a nice compromise that at least allowed something, but now with that gone we're back to not knowing when/how the shoe is going to drop. Not a fun place to be.

andyferris 9 hours ago|||

It benefits those that made the decision. That’s the thing to understand.

readred 8 hours ago|||

its because they're worried about _their_ vulnerabilities being patched with a prompt as simple as 'fix this code'

i'd love to see the research paper with the CVE's and 'delibrately planted vulnerabilities', I bet we could infer relatively accurately where some of these things lie

alecco 8 hours ago||

Could be that the generated regression tests create actionable exploit code.

hedora 6 hours ago||

Note that Anthropic is still lobbying for the government to exert centralized control over models, so both sides of the “debate” have taken a pro fascist stance.

The “AI ethics” teams at these companies are the spearhead of the attack on democracy and civil society. Anyone that has taken a high school level history class, let alone read any important ethics literature would know that “centralize control over thought, speech and technology” is a fundamentally unethical stance.

For these groups to claim they are ethics researchers is offensive.

(I’m using the Wikipedia definition of fascism: “Fascism is characterized by support for a dictatorial leader, centralized autocracy, militarism, forcible suppression of opposition, belief in a natural social hierarchy, subordination of individual interests for the perceived interest of the nation or race, and strong regimentation of society and the economy.”)

blitzar 8 hours ago||

The code is correct; humanity needs fixing.

Kill all humans, kill all humans.

b3lvedere 8 hours ago|

https://www.savagechickens.com/2026/05/problem-solver.html

davesque 3 hours ago||

Kind of highlights how ridiculous their notion of safety is in this case. By this measure, I guess making the model "safe" means making it play dumb and intentionally ignore security bugs that it notices in the code? And what will the eventual legality of this look like? "Yes, your honor, we allege that this AI system that was sold to us willingly and knowingly ignored a critical security vulnerability in our software system, thereby leading us to be hacked and causing our business to fold."

It's exactly the same problem as backdoors in crypto systems. Criminals will find the crypto that isn't broken and use it regardless (or make it for themselves), while the rest of us losers are stuck with the broken version that we're allowed to use.

On this issue of cyber security, it seems better if authorities just start acting like the cat is out of the bag instead of pretending like it isn't. ASI is basically here now, so what are we going to do about it? Let's not bother pretending otherwise.

On another note, I doubt this was anything other than a vindictive administration enacting revenge on a party that refused them. We all know the Trump admin's priorities.

vlovich123 6 hours ago||

> In her blog, Moussouris argues that there was no guardrail bypass or jailbreak. Defenders should be able to ask AI systems to find and fix bugs, and write tests to validate the patch, she said. Anthropic’s models were doing “the most valuable thing an AI model can do for defensive security: executing the find, fix, and test loop defenders run every day.”

This is a very weak argument IMHO. The line between a “defensive” model and an “offensive” one is not that big of a - once my defensive model finds all the vulnerabilities, I can hand them off to my unlocked, dumber, offensive models. Attacking at scale is not so different.

I don’t think anyone in the field has a good answer for the cybersecurity threat really good AI models pose. You can’t even like embargo for some time period while you go and patch vulnerable systems because the worse models will still be there cranking out vulnerabilities faster than you can defend.

iloveoof 9 hours ago|

Ahhh! Software engineering!

merlindru 7 hours ago|

right? the horrors!!

seems like the politicians are finally realizing what we've all been up to

More comments...