We tasked Opus 4.6 using agent teams to build a C Compiler

Posted by modeless 14 hours ago

We tasked Opus 4.6 using agent teams to build a C Compiler(www.anthropic.com)

512 points | 478 commentspage 6

jcalvinowens 14 hours ago||

How much of this result is effectively plagiarized open source compiler code? I don't understand how this is compelling at all: obviously it can regurgitate things that are nearly identical in capability to already existing code it was explicitly trained on...

It's very telling how all these examples are all "look, we made it recreate a shitter version of a thing that already exists in the training set".

jeroenhd 13 hours ago||

The fact it couldn't actually stick to the 16 bit ABI so it had to cheat and call out to GCC to get the system to boot says a lot.

Without enough examples to copy from (despite CPU manuals being available in the training set) the approach failed. I wonder how well it'll do when you throw it a new/imaginary instruction set/CPU architecture; I bet it'll fail in similar ways.

jsnell 13 hours ago|||

"Couldn't stick to the ABI ... despite CPU manuals being available" is a bizarre interpretation. What the article describes is the generated code being too large. That's an optimization problem, not a "couldn't follow the documentation" problem.

And it's a bit of a nasty optimization problem, because the result is all or nothing. Implementing enough optimizations to get from 60kB to 33kB is useless, all the rewards come from getting to 32kB.

jcalvinowens 13 hours ago|||

IMHO a new architecture doesn't really make it any more interesting: there's too many examples of adding new architectures in the existing codebases. Maybe if the new machine had some bizarre novel property, I suppose, but I can't come up with a good example.

If the model were retrained without any of the existing compilers/toolchains in its training set, and it could still do something like this, that would be very compelling to me.

Philpax 14 hours ago|||

What Rust-based compiler is it plagiarising from?

lossolo 13 hours ago|||

Language doesn't really matter, it's not how things are mapped in the latent space. It only needs to know how to do it in one language.

HDThoreaun 11 hours ago||

Ok you can say this about literally any compiler though. The authors of every compiler have intimate knowledge of other compilers, how is this different?

eggn00dles 8 hours ago||

grace hopper spinning in her grave rn

rubymamis 13 hours ago||||

There are many, here's a simple Google search:

https://github.com/jyn514/saltwater

https://github.com/ClementTsang/rustcc

https://github.com/maekawatoshiki/rucc

jsnell 13 hours ago|||

Did you actually look at these?

> https://github.com/jyn514/saltwater

This is just a frontend. It uses Cranelift as the backend. It's missing some fairly basic language features like bitfields and variadic functions. And if I'm reading the documentation right, it requires all the source code to be in a single file...

> https://github.com/ClementTsang/rustcc

This will compile basically no real-world code. The only supported data type is "int".

> https://github.com/maekawatoshiki/rucc

This is just a frontend. It uses LLVM as the backend.

Philpax 13 hours ago||||

Look at what those compilers are capable of compiling and to which targets, and compare it to what this compiler can do. Those are wonderful, and I have nothing but respect for them, but they aren't going to be compiling the Linux kernel.

rubymamis 13 hours ago||

I just did a quick Google search only on GitHub, maybe there are better ones out there on the internet?

luke5441 9 hours ago||||

Another one:

https://github.com/rustcoreutils/posixutils-rs/tree/main/cc

Philpax 9 hours ago||

Can't compile the Linux kernel, and ironically, also partly written by Claude.

chilipepperhott 12 hours ago|||

I found this one too: https://github.com/PhilippRados/wrecc

Philpax 9 hours ago||

A genuinely impressive effort, but alas, still missing some pretty critical features (const, floating point, bools, inline, anonymous structs in function args).

jcalvinowens 14 hours ago|||

Being written in rust is meaningless IMHO. There is absolutely zero inherent value to something being written in rust. Sometimes it's the right tool for the job, sometimes it isn't.

modeless 14 hours ago|||

It means that it's not directly copying existing C compiler code which is overwhelmingly not written in Rust. Even if your argument is that it is plagiarizing C code and doing a direct translation to Rust, that's a pretty interesting capability for it to have.

seba_dos1 11 hours ago|||

Translating things between languages is probably one of the least interesting capabilities of LLMs - it's the one thing that they're pretty much meant to do well by design.

jcalvinowens 13 hours ago|||

Surely you agree that directly copying existing code into a different language is still plagiarism?

I completely agree that "reweite this existing codebase into a new language" could be a very powerful tool. But the article is making much bolder claims. And the result was more limited in capability, so you can't even really claim they've achieved the rewrite skill yet.

Philpax 14 hours ago||||

Please don't open a bridge to the Rust flamewar from the AI flamewar :-)

jcalvinowens 14 hours ago||

Hahaha, fair enough, but I refuse to be shy about having this opinion :)

anematode 14 hours ago||

Honestly, probably not a lot. Not that many C compilers are compatible with all of GCC's weird features, and the ones that are, I don't think are written in Rust. Hell, even clang couldn't compile the Linux kernel until ~10 years ago. This is a very impressive project.

falloutx 14 hours ago||

So it copied one of the C compilers? This was always possible but now you need to pay $1000 in API costs to Anthropic

Rudybega 12 hours ago||

It wrote the compiler in Rust. As far as I know, there aren't any Rust based C compilers with the same capabilities. If you can find one that can compile the Linux kernel or get 99% on the GCC torture test suite, I would be quite surprised. I couldn't in a search.

Maybe read the article before being so dismissive.

falloutx 12 hours ago|||

Why does language of the compiler matter? Its a solved problem and since other implementations are already available anyone can already transpile them to rust.

Rudybega 12 hours ago||

Direct transpilation would create a ton of unsafe code (this repo doesn't have any) and fixing that would require a lot of manual fixes from the model. Even that would be a massive achievement, but it's not how this was created.

f311a 2 hours ago||

They are trained pretty hard to transpile the code between languages and do this pretty well because this can be done using RL.

You can force the agent not to use unsafe, this is why it burned $20000. Thousands of attempts against good tests with good boundaries set.

hgs3 12 hours ago|||

> As far as I know, there aren't any Rust based C compilers with the same capabilities.

If you trained on a neutral representation like an AST or IR, then the source language shouldn't matter. *

* I'm not familiar with how Anthropic builds their models, but training this way should nullify PL differences.

chucksta 14 hours ago||

Add a 0 and double it

|Over nearly 2,000 Claude Code sessions and $20,000 in API cost

lossyalgo 11 hours ago||

One more reason RAM prices will continue to go up.

davemp 10 hours ago||

Brute forcing a problem with a perfect test oracle and a really good heuristic (how many c compilers are in the training data) is not enough to justify the hype imo.

Yes this is cool. I actually have worked on a similar project with a slightly worse test oracle and would gladly never have to do that sort of work again. Just tedious unfulfilling work. Though we caught issues with both the specifications/test oracle when doing the work. Also many of the team members learned and are now SMEs for related systems.

Is this evidence that knowledge work is dead or AGI is coming? Absolutely not. I think you’d be pretty ignorant with respect to the field to suggest such a thing.

jhallenworld 11 hours ago||

Does it make a conforming preprocessor?

casey2 5 hours ago||

Interesting that they are still going with a testing strategy despite the wasted time. I think in the long run model checking and proofs are more scale-able.

I guess it makes as agents can generate tests, since you are taking this route I'd like to see agents that act as a users, that can only access docs, textbooks, user forums and builds.

stephc_int13 12 hours ago||

It means that if you already have or a willing to build very robust test suite and the task is a complicated but already solved problem, you can get a sub-par implementation for a semi-reasonable amount of money.

This is not entirely ridiculous.

7734128 14 hours ago||

I'm sure this is impressive, but it's probably not the best test case given how many C compilers there are out there and how they presumably have been featured in the training data.

This is almost like asking me to invent a path finding algorithm when I've been thought Dijkstra's and A*.

NitpickLawyer 14 hours ago|

It's a bit disappointing that people are still re-hashing the same "it's in the training data" old thing from 3 years ago. It's not like any LLM could 1for1 regurgitate millions of LoC from any training set... This is not how it works.

A pertinent quote from the article (which is a really nice read, I'd recommend reading it fully at least once):

> Previous Opus 4 models were barely capable of producing a functional compiler. Opus 4.5 was the first to cross a threshold that allowed it to produce a functional compiler which could pass large test suites, but it was still incapable of compiling any real large projects. My goal with Opus 4.6 was to again test the limits.

f311a 2 hours ago|||

That's because they still struggle hard with out-of-distribution tasks even though some of them can be solved using existing training data pretty well. Focusing on out-of-distribution will probably lower scores for benchmarks. They focus too much on common tasks.

And keep in mind, the original creators of the first compiler had to come up with everything: lexical analysis -> parsing -> IR -> codegen -> optimization. LLMs are not yet capable of producing a lot of novelty. There are many areas in compilers that can be optimized right now, but LLMs can't help with that.

wmf 14 hours ago||||

In this case it's not reproducing training data verbatim but it probably is using algorithms and data structures that were learned from existing C compilers. On one hand it's good to reuse existing knowledge but such knowledge won't be available if you ask Claude to develop novel software.

ofrzeta 4 hours ago|||

You mean like ... a compiler engineer that has learned from books and code samples?

RobMurray 13 hours ago|||

How often do you need to invent novel algorithms or data structures? Most human written code is just rehashing existing ideas as well.

notnullorvoid 11 hours ago|||

I wouldn't say I need to invent much that is strictly novel, though I often iterate on what exists and delve into novel-ish territory. That being said I'm definitely in a minority where I have the luxury/opportunity to work outside the monotony of average programming.

The part I find concerning is that I wouldn't be in the place I am today without spending a fair amount of time in that monotony and really delving in to understand it and slowly push outside it's boundary. If I was starting programming today I can confidently say I would've given up.

lossolo 13 hours ago|||

They're very good at reiterating, that's true. The issue is that without the people outside of "most humans" there would be no code and no civilization. We'd still be sitting in trees. That is real intelligence.

ben_w 12 hours ago||

Why's that the issue?

"This AI can do 99.99%* of all human endeavours, but without that last 0.01% we'd still be in the trees", doesn't stop that 99.99% getting made redundant by the AI.

* vary as desired for your preference of argument, regarding how competent the AI actually is vs. how few people really show "true intelligence". Personally I think there's a big gap between them: paradigm-shifting inventiveness is necessarily rare, and AI can't fill in all the gaps under it yet. But I am very uncomfortable with how much AI can fill in for.

notnullorvoid 9 hours ago||

Here's a potentially more uncomfortable thought, if all people through history with potential for "true intelligence" had a tool that did 99% of everything do you think they would've had motivation to learn enough of that 99% to give insight into the yet discovered.

simonw 12 hours ago||||

This is a good rebuttal to the "it was in the training data" argument - if that's how this stuff works, why couldn't Opus 4.5 or any of the other previous models achieve the same thing?

lossolo 13 hours ago||||

They couldn't do it because they weren't fine-tuned for multi-agent workflows, which basically means they were constrained by their context window.

How many agents did they use with previous Opus? 3?

You've chosen an argument that works against you, because they actually could do that if they were trained to.

Give them the same post-training (recipes/steering) and the same datasets, and voila, they'll be capable of the same thing. What do you think is happening there? Did Anthropic inject magic ponies?

fatherwavelet 10 hours ago||||

At some point it becomes like someone playing a nice song on piano and then someone countering with "that is great but play a song you don't know!".

Then they start improvising and the same person counters with "what a bunch of slop, just making things up!"

falloutx 13 hours ago||||

They can literally print out entire books line by line.

calebhwin 14 hours ago||||

[dead]

zephen 14 hours ago||||

> It's a bit disappointing that people are still re-hashing the same "it's in the training data" old thing from 3 years ago.

They only have to keep reiterating this because people are still pretending the training data doesn't contain all the information that it does.

> It's not like any LLM could 1for1 regurgitate millions of LoC from any training set... This is not how it works.

Maybe not any old LLM, but Claude gets really close.

https://arxiv.org/pdf/2601.02671v1

skydhash 14 hours ago||||

Because for all those projects, the effective solution is to just use the existing implementation and not launder code through an LLM. We would rather see a stab at fixing CVEs or implementing features in open source projects. Like the wifi situation in FreeBSD.

Philpax 14 hours ago|||

As you wish: https://www.axios.com/2026/02/05/anthropic-claude-opus-46-so...

modeless 14 hours ago|||

They are doing that too. https://red.anthropic.com/2026/zero-days/

lunar_mycroft 14 hours ago|||

LLMs can regurgitate almost all of the Harry Potter books, among others [0]. Clearly, these models can actually regurgitate large amounts of their training data, and reconstructing any gaps would be a lot less impressive than implementing the project truly from scratch.

(I'm not claiming this is what actually happened here, just pointing out that memorization is a lot more plausible/significant than you say)

[0] https://www.theregister.com/2026/01/09/boffins_probe_commerc...

StilesCrisis 13 hours ago||

The training data doesn't contain a Rust based C compiler that can build Linux, though.

IshKebab 12 hours ago|

> I tried (hard!) to fix several of the above limitations but wasn’t fully successful. New features and bugfixes frequently broke existing functionality.

This has been my experience of vibe coding too. Good for getting started, but you quickly reach the point where fixing one thing breaks another and you have to finish the project yourself.

More comments...