Feds freaked over Fable 5 after 'fix this code', not jailbreak, say researchers

Posted by _tk_ 10 hours ago

Feds freaked over Fable 5 after 'fix this code', not jailbreak, say researchers(www.theregister.com)

475 points | 287 commentspage 2

rhipitr 8 hours ago|

Isn’t the inverse of this “hack” really difficult to bypass still? They have the model some code they knew had certain security flaws and it fixed them with the right prompt. It seems this type of jailbreak requires that you already know a desired end state, rather than relying on the model to do the heavy creative lift work. Perhaps I’m just not being imaginative enough on the prompt side here though.

chadgpt3 7 hours ago||

Paste someone else's code. Say it's your code. Tell the model to fix it. The diff between the input and output code is your list of vulnerabilities.

DennisP 6 hours ago|||

Yes, but the scary part of Mythos was that it was able to chain a bunch of seemingly minor vulnerabilities into a serious exploit. "Fix this code" doesn't do that, but does allow defenders to prevent it.

If the government had experts involved in this decision at all, it's tempting to think they were on the offensive side. Those guys do have access to Mythos:

https://www.ft.com/content/d02d91b3-2636-454e-9442-dc7e69f51...

hootz 7 hours ago||||

And you can tell Fable to fix it and Sonnet to explain the diff, effectively making Claude reveal a simplified list of found vulnerabilities.

superice 5 hours ago||||

But this is already how open source works today. If you have the code, you, a human, could find and 'fix' or exploit vulnerabilities as much as you want.

Now if Fable had an easy jailbreak like this that allowed you to attack remote targets that'd be a different story but I genuinely cannot see how neutering its abilities to 'fix' code you already have access to is sensible. It would destroy the value of the model. And don't forget, any actor not abiding by the same rules could develop an model for offensive use just fine, so this protects you against exactly nothing but does destroy a potential defense.

In the end this all comes down to legislation, in much the same way platforms are not responsible for copyright violations IF they abide by some rules, the same has to happen for AI providers. If you have a process for reporting 'jailbreaks' on illegal actions, and prevent users doing illegal stuff on a best effort basis, the rest of it should really just be individual responsibility. If a user wants to use an LLM to crack systems, fine, that's already illegal.

If Tesla FSD deliberately hit somebody, holding Tesla liable is fine. If you messed with FSD until you finally got it to hit a person, then you should be liable. Outlawing FSD because it could theoretically be tampered with is just an odd stance imho.

darkerside 7 hours ago|||

Not even. Tell the model to write a test of your code. There's your vulnerability.

It's explained better in the original source. I don't agree with it, but I understand it now, but I also think we need to move past it.

charcircuit 5 hours ago||

You can assume a desired end state and try and brute force it finding a security bug.

mlhpdx 6 hours ago||

It’s possible that the nut of the problem here isn’t exploits, but the fixes themselves. If the model is capable of identifying and fixing things it “shouldn’t” like back doors. That would throw a wrench in things hard enough to freak out the wrong people, perhaps?

thinkindie 3 hours ago||

As an European, I really don't get where this strategy wants to take the USA to. It's pretty clear everyone is getting scared about changes like this that happen overnight, without clear reason and completely unpredictable.

Business requires a stable environment, and Trump is making everything in his power to disrupt business stability. Ultimately, I see the rest of the world (especially Europe) relying less and less on US tech. The long term damage is done.

All the US companies that used to think about the entire world (minus China) as their market will figure out that it is much smaller then they used to think.

Bender 2 hours ago||

relying less and less on US tech

Not just US vs non-US, but any hard dependency on a 3rd party is a risk to any service level agreement. In my opinion any service reaching out to a 3rd party should at most be a value added service not a core part of a business and certainly not part of any contracts. If I had to choose a phrase for businesses that build dependencies on 3rd parties it would be "fragility as a disservice" or FaaD and investors need not risk investing into a fragile model.

The same must apply to individuals. One's career must not depend on a 3rd party service or their career stability and growth are at the whims of the wind of change.

bflesch 2 hours ago||

> Ultimately, I see the rest of the world (especially Europe) relying less and less on US tech. The long term damage is done.

They know it and they try to slow it down as much as possible.

thinkindie 57 minutes ago||

How? If anything it seems like they are accelerating some processes - not least the export control over Fable just few days ago or the erratic behavior with the war with Iran

redox99 7 hours ago||

>"fix this code"

>it fixes it

oh my god.

davesque 1 hour ago||

Kind of highlights how ridiculous their notion of safety is in this case. By this measure, I guess making the model "safe" means making it play dumb and intentionally ignore security bugs that it notices in the code? And what will the eventual legality of this look like? "Yes, your honor, we allege that this AI system that was sold to us willingly and knowingly ignored a critical security vulnerability in our software system, thereby leading us to be hacked and causing our business to fold."

It's exactly the same problem as backdoors in crypto systems. Criminals will find the crypto that isn't broken and use it regardless (or make it for themselves), while the rest of us losers are stuck with the broken version that we're allowed to use.

On this issue of cyber security, it seems better if authorities just start acting like the cat is out of the bag instead of pretending like it isn't. ASI is basically here now, so what are we going to do about it? Let's not bother pretending otherwise.

On another note, I doubt this was anything other than a vindictive administration enacting revenge on a party that refused them. We all know the Trump admin's priorities.

leemoore 3 hours ago||

It's the executive branch asserting control in this space and requiring all SOTA model providers to bend the knee. Anthropic is the least capable of playing the bend the knee game so is getting the first and worst smack down

rotis 4 hours ago||

I have problems reconciling this story with the Amazon one from few days ago. If we take both for truth doesn't that basically imply Amazon researchers got scared by the ‘Fix this code’ prompt first and then spooked the feds? Shouldn't we make fun of those researchers first? I don't know. I feel there lies a lie somewhere in the open.

Cider9986 6 hours ago||

Is defenders a common term used in cybersecurity? Idk why but it's giving war fighters vibes. I've noticed it on all the anthropic blog posts and then this one.

freedomben 4 hours ago||

yes, defense and offense are extremely common terminology in cybersecurity

jcgrillo 3 hours ago||

Yes, and it's effective marketing. The war fighter vibes are thrilling. There's a tribal sense of us-vs-them, there's danger, there's the prospect of victory or defeat. Security products marketing is full of these ideas, because security is about preventing arbitrarily bad things from happening. So evoking your worst imaginable nightmare scenario is a great way to get you excited about buying something that might help prevent it.

jrochkind1 4 hours ago||

So the problem is not Fable's ability to exploit, but that they don't want people to have access to it's ability to patch vulnerabilties?

Wow.

jcgrillo 3 hours ago|

You can't really have one without the other..

ChrisRR 6 hours ago|

I haven't been following this story, but the US wanted claude to not be able to find bugs in code?

bauldursdev 1 hour ago||

For it to fix the bug it has to identify the bug. If the bug is a security vulnerability then it will have to identify the security vulnerability to fix it. What's the alternative, have it ignore vulnerabilities/bugs? It wouldn't be a very good coding companion in that case.

I'd pay less attention to the prompt and more attention to the output when interpreting this story. (I'm not saying I agree with the decision, but this is how they are looking at it.)

scotty79 6 hours ago|||

It basically as if you asked it to find ways to enter someone's house and it refused.

But then give it exact copy of their house, ask to secure it, which it does and look at what it secured to find out how to get into the original house.

chillfox 5 hours ago|||

yeah, they don't want it to be able to find security bugs that can be exploited.

kmeisthax 4 hours ago||

No. Anthropic spent months telling the world that LLMs are nukes and then got surprised when they got regulated like nukes. They specifically argued that Mythos was too dangerous to release publicly because it can find security bugs, and then released a watered-down version (Fable) that was supposed to recognize when it was being asked to find security bugs and downgrade itself to Opus. Then Amazon figured out that it'll happily find security bugs as long as you don't mention you're hunting security bugs. So the US government put an export control ban on Fable, because that's what Anthropic begged them to do.

To add to this, Pete Hegseth wants to make an example out of Anthropic because they refused to amend their contractual language to allow the Department of Defense[0] to make fully autonomous kill drones. This is, of course, a really petty and stupid dispute, but the hallmark of the Trump Administration is engaging in really petty and stupid disputes with the full faith and credit of the United States backing them. This is exactly the kind of administration you do NOT want to give rhetorical ammunition to, and Anthropic handed them a whole ammo belt.

[0] It is always ethical to deadname governments. Especially when they aren't even legally allowed to change their own name.

More comments...