We tasked Opus 4.6 using agent teams to build a C Compiler

Posted by modeless 13 hours ago

We tasked Opus 4.6 using agent teams to build a C Compiler(www.anthropic.com)

488 points | 455 commentspage 5

cuechan 10 hours ago|

> The compiler is an interesting artifact on its own [...]

its funny bacause by (most) definitions, it is not an artifact:

> a usually simple object (such as a tool or ornament) showing human workmanship or modification as distinguished from a natural object

stevefan1999 7 hours ago||

I tried writing a C compiler in Rust in the spirit of TCC, but I'm just too lazy to finish it.

mucle6 8 hours ago||

This feels like the start of a paradigm shift.

I need to reunderwrite what my vision of the future looks like.

stephc_int13 11 hours ago||

They should add this to the benchmark suite, and create a custom eval for how good the resulting compiler is, as well as how maintainable the source code.

snek_case 10 hours ago|

This would be an expensive benchmark to run on a regular basis, though I guess for the big AI labs it's nothing. Code quality is hard to objectively measure, however.

jcalvinowens 12 hours ago||

How much of this result is effectively plagiarized open source compiler code? I don't understand how this is compelling at all: obviously it can regurgitate things that are nearly identical in capability to already existing code it was explicitly trained on...

It's very telling how all these examples are all "look, we made it recreate a shitter version of a thing that already exists in the training set".

jeroenhd 12 hours ago||

The fact it couldn't actually stick to the 16 bit ABI so it had to cheat and call out to GCC to get the system to boot says a lot.

Without enough examples to copy from (despite CPU manuals being available in the training set) the approach failed. I wonder how well it'll do when you throw it a new/imaginary instruction set/CPU architecture; I bet it'll fail in similar ways.

jsnell 12 hours ago|||

"Couldn't stick to the ABI ... despite CPU manuals being available" is a bizarre interpretation. What the article describes is the generated code being too large. That's an optimization problem, not a "couldn't follow the documentation" problem.

And it's a bit of a nasty optimization problem, because the result is all or nothing. Implementing enough optimizations to get from 60kB to 33kB is useless, all the rewards come from getting to 32kB.

jcalvinowens 12 hours ago|||

IMHO a new architecture doesn't really make it any more interesting: there's too many examples of adding new architectures in the existing codebases. Maybe if the new machine had some bizarre novel property, I suppose, but I can't come up with a good example.

If the model were retrained without any of the existing compilers/toolchains in its training set, and it could still do something like this, that would be very compelling to me.

Philpax 12 hours ago|||

What Rust-based compiler is it plagiarising from?

rubymamis 12 hours ago|||

There are many, here's a simple Google search:

https://github.com/jyn514/saltwater

https://github.com/ClementTsang/rustcc

https://github.com/maekawatoshiki/rucc

jsnell 12 hours ago|||

Did you actually look at these?

> https://github.com/jyn514/saltwater

This is just a frontend. It uses Cranelift as the backend. It's missing some fairly basic language features like bitfields and variadic functions. And if I'm reading the documentation right, it requires all the source code to be in a single file...

> https://github.com/ClementTsang/rustcc

This will compile basically no real-world code. The only supported data type is "int".

> https://github.com/maekawatoshiki/rucc

This is just a frontend. It uses LLVM as the backend.

Philpax 12 hours ago||||

Look at what those compilers are capable of compiling and to which targets, and compare it to what this compiler can do. Those are wonderful, and I have nothing but respect for them, but they aren't going to be compiling the Linux kernel.

rubymamis 12 hours ago||

I just did a quick Google search only on GitHub, maybe there are better ones out there on the internet?

luke5441 8 hours ago||||

Another one:

https://github.com/rustcoreutils/posixutils-rs/tree/main/cc

Philpax 8 hours ago||

Can't compile the Linux kernel, and ironically, also partly written by Claude.

chilipepperhott 11 hours ago|||

I found this one too: https://github.com/PhilippRados/wrecc

Philpax 8 hours ago||

A genuinely impressive effort, but alas, still missing some pretty critical features (const, floating point, bools, inline, anonymous structs in function args).

lossolo 12 hours ago||||

Language doesn't really matter, it's not how things are mapped in the latent space. It only needs to know how to do it in one language.

HDThoreaun 10 hours ago||

Ok you can say this about literally any compiler though. The authors of every compiler have intimate knowledge of other compilers, how is this different?

eggn00dles 6 hours ago||

grace hopper spinning in her grave rn

jcalvinowens 12 hours ago|||

Being written in rust is meaningless IMHO. There is absolutely zero inherent value to something being written in rust. Sometimes it's the right tool for the job, sometimes it isn't.

modeless 12 hours ago|||

It means that it's not directly copying existing C compiler code which is overwhelmingly not written in Rust. Even if your argument is that it is plagiarizing C code and doing a direct translation to Rust, that's a pretty interesting capability for it to have.

seba_dos1 9 hours ago|||

Translating things between languages is probably one of the least interesting capabilities of LLMs - it's the one thing that they're pretty much meant to do well by design.

jcalvinowens 12 hours ago|||

Surely you agree that directly copying existing code into a different language is still plagiarism?

I completely agree that "reweite this existing codebase into a new language" could be a very powerful tool. But the article is making much bolder claims. And the result was more limited in capability, so you can't even really claim they've achieved the rewrite skill yet.

Philpax 12 hours ago||||

Please don't open a bridge to the Rust flamewar from the AI flamewar :-)

jcalvinowens 12 hours ago||

Hahaha, fair enough, but I refuse to be shy about having this opinion :)

anematode 12 hours ago||

Honestly, probably not a lot. Not that many C compilers are compatible with all of GCC's weird features, and the ones that are, I don't think are written in Rust. Hell, even clang couldn't compile the Linux kernel until ~10 years ago. This is a very impressive project.

falloutx 13 hours ago||

So it copied one of the C compilers? This was always possible but now you need to pay $1000 in API costs to Anthropic

Rudybega 11 hours ago||

It wrote the compiler in Rust. As far as I know, there aren't any Rust based C compilers with the same capabilities. If you can find one that can compile the Linux kernel or get 99% on the GCC torture test suite, I would be quite surprised. I couldn't in a search.

Maybe read the article before being so dismissive.

falloutx 10 hours ago|||

Why does language of the compiler matter? Its a solved problem and since other implementations are already available anyone can already transpile them to rust.

Rudybega 10 hours ago||

Direct transpilation would create a ton of unsafe code (this repo doesn't have any) and fixing that would require a lot of manual fixes from the model. Even that would be a massive achievement, but it's not how this was created.

f311a 46 minutes ago||

They are trained pretty hard to transpile the code between languages and do this pretty well because this can be done using RL.

You can force the agent not to use unsafe, this is why it burned $20000. Thousands of attempts against good tests with good boundaries set.

hgs3 10 hours ago|||

> As far as I know, there aren't any Rust based C compilers with the same capabilities.

If you trained on a neutral representation like an AST or IR, then the source language shouldn't matter. *

* I'm not familiar with how Anthropic builds their models, but training this way should nullify PL differences.

chucksta 12 hours ago||

Add a 0 and double it

|Over nearly 2,000 Claude Code sessions and $20,000 in API cost

lossyalgo 10 hours ago||

One more reason RAM prices will continue to go up.

casey2 4 hours ago||

Interesting that they are still going with a testing strategy despite the wasted time. I think in the long run model checking and proofs are more scale-able.

I guess it makes as agents can generate tests, since you are taking this route I'd like to see agents that act as a users, that can only access docs, textbooks, user forums and builds.

davemp 8 hours ago|

Brute forcing a problem with a perfect test oracle and a really good heuristic (how many c compilers are in the training data) is not enough to justify the hype imo.

Yes this is cool. I actually have worked on a similar project with a slightly worse test oracle and would gladly never have to do that sort of work again. Just tedious unfulfilling work. Though we caught issues with both the specifications/test oracle when doing the work. Also many of the team members learned and are now SMEs for related systems.

Is this evidence that knowledge work is dead or AGI is coming? Absolutely not. I think you’d be pretty ignorant with respect to the field to suggest such a thing.

More comments...