Opus 4.6 uncovers 500 zero-day flaws in open-source code

Posted by speckx 1 day ago

Opus 4.6 uncovers 500 zero-day flaws in open-source code(www.axios.com)

209 points | 140 commentspage 2

HAL3000 1 day ago|

I honestly wonder how many of these are written by LLMs. Without code review, Opus would have introduced multiple zero day vulnerabilities into our codebases. The funniest one: it was meant to rate-limit brute-force attempts, but on a failed check it returned early and triggered a rollback. That rollback also undid the increment of the attempt counter so attackers effectively got unlimited attempts.

ChrisMarshallNY 1 day ago||

When I read stuff like this, I have to assume that the blackhats have already been doing this, for some time.

kibibu 1 day ago|

Not with Opus 4.6 they haven't

ChrisMarshallNY 1 day ago||

Good point. I suspect that they'll be addressing that, quickly...

bastard_op 1 day ago||

It's not really worth much when it doesn't work most of the time though:

https://github.com/anthropics/claude-code/issues/18866 https://updog.ai/status/anthropic

tptacek 1 day ago||

It's a machine that spits out sev:hi vulnerabilities by the dozen and the complaint is the uptime isn't consistent enough?

bastard_op 1 day ago||

If I'm attempting to use it as a service to do continuous checks on things and it fails 50% of the time, I'd say yes, wouldn't you?

tptacek 1 day ago|||

If you had a machine with a lever, and 7 times out of 10 when you pulled that lever nothing happened, and the other 3 times it spat a $5 bill at you, would your immediate next step be:

(1) throw the machine away

(2) put it aside and call a service rep to come find out what's wrong with it

(3) pull the lever incessantly

I only have one undergrad psych credit (it's one of my two college credits), but it had something to say about this particular thought experiment.

candiddevmike 1 day ago||

You're leaving out how much it costs to pull the lever, both in time and money.

Dylan16807 1 day ago||

If we're making a reasonable analogy, then successful pulls cost much less than $5 of time and money.

If the analogy is comparing to downtime, then unsuccessful pulls cost basically nothing.

jsnell 1 day ago|||

But it's not failing 50% of the time. Their status page[0] shows about 99.6% availability for both the API and Claude Code. And specifically for the vulnerability finding use case that the article was about and you're dismissing as "not worth much", why in the world would you need continuous checks to produce value?

[0] https://status.claude.com/

anhner 1 day ago||

updog? what's updog?

bastard_op 14 hours ago||

It's an uptime service from DataDog, and enterprise event/log/siem/monitoring/apm company, like Splunk. So what they do is watch uptime stuff for your favorite large business.

bxguff 1 day ago||

In so far as model use cases I don't mind them throwing their heads against the wall in sandboxes to find vulnerabilities but why would it do that without specific prompting? Is anthropic fine with claude setting it's own agendas in red-teaming? That's like the complete opposite of sanitizing inputs.

garbawarb 1 day ago||

Have they been verified?

siva7 1 day ago||

Wasn't this Opus thing released like 30 minutes ago?

Topfi 1 day ago||

I understand the confusion, this was done by Anthropics internal Red team as part of model testing prior to release.

jjice 1 day ago|||

A bunch of companies get early access.

input_sh 1 day ago||

Yes, you just need to be a Claude++ plan!

tintor 1 day ago|||

Singularity

blinding-streak 1 day ago||

Opus 4.6 uses time travel.

ains 1 day ago||

https://archive.is/N6In9

thisisauserid 1 day ago||

Well, I guess I know what I'm doing for the first hour when 4.7 comes out.

maxclark 1 day ago||

Did they submit 500 patches?

Bridged7756 1 day ago|

How can an LLM uncover 500 zero day flaws in open source? It puts them there in the first place.

More comments...