Hardening Firefox with Claude Mythos Preview

Posted by HieronymusBosch 1 day ago

Hardening Firefox with Claude Mythos Preview(hacks.mozilla.org)

https://arstechnica.com/information-technology/2026/05/mozil...

355 points | 159 commentspage 2

danieltanfh95 1 day ago|

Really it was not the issue that Opus could not do all these, there was just no incentive to fix bugs. Mythos represented a real marketing use case, so yes thanks for spending money to fix this, but this is not sustainable.

crummy 1 day ago||

Curious if people think LLMs will lead to more secure or less secure software in five years.

jillesvangurp 1 day ago||

It will probably wipe out a few categories of issues, which is probably a good thing. And those things that still are still insecure can also be translated to some other language.

Translating things to Rust manually was already a thing before LLMs came into the picture. Now with LLMs that's only going to get easier and faster. The long term value is going to come from getting on top of the mountain of technical debt in the form of existing C/C++ code bases that is responsible for the vast majority of memory exploits, buffer overflows, and other issues that despite decades of attention still are being found across major code bases on a regular basis.

Mozilla finding these issues comes on the back of a quarter century of some very competent engineers trying to do the right thing and using all the tools at their disposal to prevent these issues from happening. I have a lot of respect for that team and the contributions it has made over the years to improve tools, testing/verification practices, etc. The issue is not their effort or competence.

The job of taking an existing system that is well covered in test, well documented/specified, etc. and producing a new one that can function as a drop in replacement is now something that can be considered. A few years ago that would have translated into absolutely massive project cost and risk. Now it's something you can kick off on a Friday afternoon. Worst case it doesn't work, best case you end up with a much better implementation.

It's still early days. There are still a lot of quality issues with LLM generated code. But the success/fail rate will probably improve over time.

int32_64 1 day ago|||

Both. The skilled will use them to find problems, the unskilled will use them to slopcode insecure software the skilled will have to fix.

mc3301 1 day ago|||

Kinda like home-improvement stores, power tools, easily available hardware and youtube tutorials led to both incredibly amazing and durable furniture, as well as janky, ugly and even dangerous furniture.

More tools for more people equals more stuff being made on a wider range.

FeepingCreature 1 day ago|||

More secure software, but in the same way that the population is net healthier after a plague.

data-ottawa 1 day ago|||

I’m just happy we’re talking about security.

That will make software safer alone.

vga1 1 day ago|||

More secure, at least in the cases where the tools are properly applied.

But it also represents more easily available opportunities for blackhats to abuse against the projects where these tools were not being applied.

2ndorderthought 1 day ago|||

Less secure because of all the ways attacks can scale out and hackers can contribute vulnerabilities to active projects.

bawolff 1 day ago|||

One of the biggest issues in security historically imo is vendors who think, well nobody will ever find this bug so we can deprioritize fixing it. LLMs will prevent vendors lying to themselves which will lead to more secure software.

stavros 1 day ago|||

That depends on which side has more money.

UltraSane 1 day ago||

In 5 years attackers have an advantage but in the long run I think more secure if developers use LLMs on software to find and fix all of the worse remotely exploitable bugs before release. LLMs are going to force devs to be much more security conscious.

canucker2016 11 hours ago||

I think it'll be a war of who has the better LLMs-as-security-scanner.

Ideally, you'd do a comprehensive all-source-code scan, (and the LLM-scanner finds everything during those scans), and fix all the reported defects.

Afterwards, any dev that commits code will run the LLM-scanner on the modified code (and affected areas) and fix any reported defects.

So the black-hat hacker would be shut out unless they get access to an LLM-scanner with better analysis than what the target project is using.

Major LLM-scanners could give priority access for new versions of LLM-scanners to major projects to find any defects in the current source code before any other party could use the reported defects against the project or their users.

So black-hat hackers would be left with developing their own LLM-scanner better/more efficient than existing major LLM-scanners.

Given enough incentive, they might develop such a tool. Look at the market for zero-day vulnerabilities for smartphones, esp iPhones.

lschueller 1 day ago||

Let's see, how this will improve the daily soc work. I still don't see, what's the big difference between Mythos and Opus, security wise. I'm confident, that this kind of vul detection is a long-term improvement. But does specifically Mythos makes such a big difference to "normal" models? I would love to see, what's the actual difference.

mccr8 1 day ago||

Quantifying the abilities of an LLM is a hard research problem, so I'm not sure if I can describe it in any great way, but Mythos did seem to be fairly clever about putting together things from different domains to find problems.

For instance, in one of the included bugs (2022034) it figured out that a floating point value being sent over IPC could be modified by an attacker in such a way that it would be interpreted by the JS engine as an arbitrary pointer, due to the way the JS engine uses a clever representation of values called NaN-boxing. This is not beyond the realm of a human researcher to find, but it did nicely combine different domains of security.

As the person responsible for accidentally introducing that security problem (and then fixing it after the Mythos report), while I am aware of NaN-boxing (despite not being a JS engine expert), I was focused more on the other more complex parts of this IPC deserialization code so I hadn't really thought about the potential problems in this context. It is just a floating point value, what could go wrong?

lschueller 1 day ago||

Okay, so far it makes sense to me. But is the deal with JS and floating point values, which isn't soemthing super special super rare stuff, only detected and identfied by Mythos while Opus wouldn't get to this point?

IainIreland 1 day ago|||

There doesn't have to be a huge qualitative discontinuity between Opus and Mythos. It's just that Mythos has reached a threshold where it's finally smart enough that putting it in a loop and asking it to find bugs is suddenly really effective. Especially at the beginning, Mozilla wasn't doing anything particularly clever with prompts. Mythos is just smart enough that the hit rate on obvious prompts is high enough to matter. (Maybe you can get similar performance out of Opus 4.6 with really smart prompts, but AFAICT nobody had managed it until Mythos.)

JoshTriplett 1 day ago|||

Among other things, Mythos seems better at "let me find, weaponize, and stack vulnerabilities until I get end-to-end from untrusted content to root", rather than just finding one thing in a specific identified area.

Havoc 1 day ago|||

Results similar to mythos have been duplicated by weaker models.

Think it's more a care of mythos raising widespread awareness that tireless LLMs can be weaponized to dig through code and find that one tiny flaw nobody spotted

sfink 1 day ago||

The report I saw kind of seemed to be pointing at a flaw and asking "do you see it?" which is not the same thing. I felt a pretty large difference between Opus 4.6's results and Mythos's, so I would be surprised if even weaker models did anywhere near as well. I'd like to see these results, if they are using a decent methodology.

Of course, even the reports with flawed methodology could be suggesting that a great harness + weak model might achieve a similar level of results as a mediocre harness + strong model. But I'd want to see solid evidence for that.

empath75 1 day ago||

There is a phase transition where LLMs match or exceed humans' ability to do something, and from that point on, even if the difference between its previous version is small, it will go from something people use rarely, to something that people use all the time.

There was a time when the entire transportation infrastructure in the US was built around horses. Even after cars were invented, the cars weren't obviously better than horses for most people, especially because there wasn't any infrastructure to support them, but the infrastructure and the cars kept improving to the point where it was better for some people at some things, then suddenly it was better at most things, and then people stopped using horses, and we re-organized our entire transportation network around cars.

But there was never a revolutionary technological change. The technology of cars in the 1930s was the same fundamental technology as the cars in the 1890s. Just at some point it became "good enough" and that was it.

I think when people say that AI is a bubble, they are assuming that anything economically useful that LLMs cannot perform today is _qualitatively_ different from what LLMs can do right now, and that LLMs cannot do it even in theory, without some major technological innovation. But I have a suspicion that there are a large number of valuable things, that once LLMs advance just a little bit more, and harnesses and infra around them is improved a little bit more will just be completely taken over by LLMs.

MetaverseClub 1 day ago||

I'm curious about how did Mozilla do bug finding before Mythos? Did they use any non-AI bug finding tools?

mccr8 1 day ago||

The usual sorts of fuzzing and static analyses, using AddressSanitizer and ThreadSanitizer. Also, with a bug bounty program to try to encourage external researchers to report issues. (I work on Firefox security; also I fixed 2 of the bugs linked in the blog post.)

canucker2016 1 day ago||

Coverity (similar to lint) scans various open source software products for vulnerabilities.

see https://www.blackduck.com/static-analysis-tools-sast/coverit...

and for Firefox-related alleged defects, see https://scan.coverity.com/projects/firefox

You have to create an account to view the actual reported defects.

There are just over 5000 reported defects still outstanding. I don't know how many overlap with the reported 271 Mythos-reported defects.

rockdoe 1 day ago|||

How many of those are false positives though? Probably just over 5000?

You get bug bounties if you report the kind of bugs Mythos identified. There's a reason no-one collected bounties from the "5000 defects" Coverity identified.

The Mythos reports have several examples of chaining a whole bunch of logic in different parts of the program together to exploit something very subtle. The Coverity reports aren't anything like that. These tools aren't remotely in the same league or even universe.

IainIreland 1 day ago|||

Yeah, fuzzing, sanitizers, and bug bounties were our main pre-AI tools for finding bugs.

MetaverseClub 1 day ago|||

it's just sad that Coverity represents the best working C++ static analysis tool.

canucker2016 11 hours ago|||

There's also PVS-Studio. They also scan open source projects - see https://pvs-studio.com/en/blog/inspections/

It's hard to convince managers to spend money on static analysis tools (or any development tool).

Unless your company just got bad publicity for a bug and your devs come to you and demonstrate that a certain static analysis tool would have flagged that particular piece of code, most managers would let the beancounter-facet dominate the decision making process.

sfink 1 day ago|||

The best general purpose one, anyway. Specialty tools can be much better for their niches. Heck, compiler warnings are one such niche tool, and some of them are quite good.

mccr8 1 day ago|||

Firefox developers do fix issues found by Coverity. I haven't looked at the results in over a decade, but the last time I did there were a few code patterns we used in a lot of places which Coverity didn't like (but were actually okay the way we were doing them) which resulted in a colossal number of false positives.

qsera 1 day ago||

The flipside of this is that with AI and prompt injection attacks, you don't need a browser vulnerability to be pwned!

gnabgib 1 day ago||

16 day old story

Wired: Mozilla Used Anthropic's Mythos to Find and Fix 271 Bugs in Firefox (41 points, 18 comments) https://news.ycombinator.com/item?id=47853649

Ars: Mozilla: Anthropic's Mythos found 271 security vulnerabilities in Firefox 150 (33 points, 8 comments)https://news.ycombinator.com/item?id=47855384

mozdeco 1 day ago|

No, it's a new post, see also

https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin...

gnabgib 1 day ago||

That's this post.. and while it's more detail on the same headline (271 bugs) these discussions look the same as 2 weeks ago (and the same as all the bloggers and podcasters discussed)

languagehacker 21 hours ago||

I'm having more problems with Firefox 150 than I have had with any other browser update in years. I think I might be the only one though?

isatis 21 hours ago|

On both desktop and Android, I've been getting HTTPS errors that require two Refresh clicks before the actual page loads.

deferredgrant 1 day ago||

A vuln finder is useful only if it respects the humans on the other end. Every bogus report taxes the same scarce attention needed for the real bugs.

xacky 1 day ago||

I just hope they don't start ignoring human created bug reports, as there are still many that haven't been fixed for years.

nnm 1 day ago|

I still don't know the exploit count for Mythos. Is it zero, one, or more?

sfink 1 day ago|

More, many more. See the bug reports linked in the post. I checked a few, and all of them had an exploit in them, and there are definitely more than 1 bugs listed.

Well, depending on how you define "exploit"; some might only read arbitrary pointers or just out of bounds. Those would be useful primitives in a chain of vulnerabilities, not exploits themselves. You'll have to read through the first comments yourself, but if you're hoping that this is all nonsense and ignorable hype, you're going to be disappointed.

More comments...