Feds freaked over Fable 5 after 'fix this code', not jailbreak, say researchers

Posted by _tk_ 13 hours ago

Feds freaked over Fable 5 after 'fix this code', not jailbreak, say researchers(www.theregister.com)

518 points | 307 commentspage 4

ZuLuuuuuu 11 hours ago|

Did they try other publicly available models on the same code with the same prompts before the ban? Was Fable the only one which was able to detect and fix the security vulnerabilities?

charcircuit 8 hours ago|

Anthropic claimed that Mythos' degree of security vulnerability bug finding was a "severe" "national security" issue. They set their own standards they were expected to follow.

vlovich123 7 hours ago||

> In her blog, Moussouris argues that there was no guardrail bypass or jailbreak. Defenders should be able to ask AI systems to find and fix bugs, and write tests to validate the patch, she said. Anthropic’s models were doing “the most valuable thing an AI model can do for defensive security: executing the find, fix, and test loop defenders run every day.”

This is a very weak argument IMHO. The line between a “defensive” model and an “offensive” one is not that big of a - once my defensive model finds all the vulnerabilities, I can hand them off to my unlocked, dumber, offensive models. Attacking at scale is not so different.

I don’t think anyone in the field has a good answer for the cybersecurity threat really good AI models pose. You can’t even like embargo for some time period while you go and patch vulnerable systems because the worse models will still be there cranking out vulnerabilities faster than you can defend.

davesque 4 hours ago||

Kind of highlights how ridiculous their notion of safety is in this case. By this measure, I guess making the model "safe" means making it play dumb and intentionally ignore security bugs that it notices in the code? And what will the eventual legality of this look like? "Yes, your honor, we allege that this AI system that was sold to us willingly and knowingly ignored a critical security vulnerability in our software system, thereby leading us to be hacked and causing our business to fold."

It's exactly the same problem as backdoors in crypto systems. Criminals will find the crypto that isn't broken and use it regardless (or make it for themselves), while the rest of us losers are stuck with the broken version that we're allowed to use.

On this issue of cyber security, it seems better if authorities just start acting like the cat is out of the bag instead of pretending like it isn't. ASI is basically here now, so what are we going to do about it? Let's not bother pretending otherwise.

On another note, I doubt this was anything other than a vindictive administration enacting revenge on a party that refused them. We all know the Trump admin's priorities.

gacgacgac 8 hours ago||

Anyone trying to find legitimacy in the ban of this model, or incredulousness at the stated reasoning is playing into the admins hands.

They want the argument to be over "is it unsafe" or "is it incompetence". In either case, your tribe gets to point at the ban and feel superior. (This is Jon Stewart's whole career -- point and laugh at how foolish the republicans appear to be.)

What's really happening is the continuing creep into fascism. The reasoning doesn't need to be sound, because they are going to ban things that displease them and everyone has to play along. They could say, "we're banning Fable because it's turning the frogs gay" and they'd expect compliance.

Umberto Eco's essay on Ur-Fascism fits as clearly as ever. Ridiculous exertions of control are performed to find the people who resist, and to knock them down.

Merely pointing out the absurdity of the reasoning isn't resistance, it's controlled opposition. Saying "All this over 'fix this code'?! How inept are they?" Is far too credulous, and is engaging on the level the fascist wants its opposition to be on, imo.

1970-01-01 7 hours ago||

"fix this government"

Voting...

tlogan 8 hours ago||

I think the only approach that might work here is to allow access only to certain pre-approved individuals.

Maybe something like TSA PreCheck.

Of course, that will not stop adversaries from getting access to the model, but it would at least create some level of control.

smasher164 5 hours ago||

Honestly, given how trivial it is for mythos-class models to identify an exploit, I’m going to assume any sufficiently large project written in C, C++, or Zig is riddled with latent vulnerabilities and compromised.

htrp 7 hours ago||

If fix this code gets by the guardrails, they are effectively using rules based classifiers (or llm as a judge on the prompt)

hughw 10 hours ago||

Suggestion: run "fix this code" on all of github before bad guys do.

HPsquared 10 hours ago|

I wonder what that would cost...

nradov 5 hours ago||

Perhaps less than the cost of not doing it.

cratermoon 6 hours ago|

"I feel like making ’90s-style t-shirts with ‘fix this code’ on the front and ‘this shirt is a munition’ on the back.”

I'd buy that shirt.

More comments...