Small models also found the vulnerabilities that Mythos found

Posted by dominicq 12 hours ago

Small models also found the vulnerabilities that Mythos found(aisle.com)

893 points | 251 commentspage 7

rvnx 11 hours ago|

Where are all the people here who claim that LLM are just useless stochastic parrots ? Did they lose internet ?

SoftTalker 11 hours ago|

The patterns of buggy code are well trained.

eiens 9 hours ago||

The bigger point of focus is that the enterprise value accrues to assets associated with software production.

What happened to all that nonsense about LLM’s solving physics, science etc? Lmao that certainly is not happening.

The natural home of LLM’s is in relation to software production.

The question is can Anthropic and OAI survive? If OAI can’t make their entry into the ad business work then they will fight over the same territory. Meaning both of their chances of survival drop as Google who is a monster in relation to software production will not only seek to kill them but buy their GPU’s at a discounted price.

ares623 5 hours ago||

Once again, it would've been so easy and simple to remove all doubt from their claims: release all the tools and harnesses they used to do it and allow 3rd parties to try and replicate their results using different models. If Mythos itself is as big a moat as they claim it is, then there shouldn't be any problem here.

They did the same stunt with the C compiler. They could've released a tool to let others replicate it, but they didn't.

Sharmaji000 9 hours ago||

[dead]

bustah 10 hours ago||

[dead]

neuzhou 11 hours ago||

[dead]

OtomotO 11 hours ago||

[flagged]

ctoth 11 hours ago||

> They recovered much of the same analysis

Really?

> We isolated the vulnerable vc_rpc_gss_validate function, provided architectural context (that it handles network-parsed RPC credentials, that oa_length comes from the packet), and asked eight models to assess it for security vulnerabilities.

No.

nfcampos 8 hours ago|

Anthropic marketing (and even supposedly technical write ups) sadly has become more hyperbole and less substance over time imo. This technology is so impressive on its own, really feels like shootings themselves in the foot in the long run, but what do I know

Case in point here where they conveniently fail to report the false positive rate, while also saying that if it wasn’t for Address Sanitizer discarding all the false positives this system would have been next to useless

decidu0us9034 7 hours ago|

Right now, we accept false positives as long as you can sort them out. I think it's pretty typical that >99% of fuzzer runs don't result in new coverage. Of course they're far from useless without feedback but it's better to have it if you can. I guess the question is does the llm approach have lower costs for validation and triaging vs just fuzzing alone, unclear to me. Anthropic would like people to believe automation is this scary new unknown