We tasked Opus 4.6 using agent teams to build a C Compiler

Posted by modeless 18 hours ago

We tasked Opus 4.6 using agent teams to build a C Compiler(www.anthropic.com)

558 points | 541 commentspage 8

almosthere 13 hours ago|

This is like the 6th trending claude story today. It must be obvious that they told everyone at Anthropic to upvote and comment.

davemp 13 hours ago||

Brute forcing a problem with a perfect test oracle and a really good heuristic (how many c compilers are in the training data) is not enough to justify the hype imo.

Yes this is cool. I actually have worked on a similar project with a slightly worse test oracle and would gladly never have to do that sort of work again. Just tedious unfulfilling work. Though we caught issues with both the specifications/test oracle when doing the work. Also many of the team members learned and are now SMEs for related systems.

Is this evidence that knowledge work is dead or AGI is coming? Absolutely not. I think you’d be pretty ignorant with respect to the field to suggest such a thing.

sho_hn 18 hours ago||

Nothing in the post about whether the compiled kernel boots.

chews 17 hours ago|

video does show it booting.

light_hue_1 17 hours ago||

> This was a clean-room implementation (Claude did not have internet access at any point during its development);

This is absolutely false and I wish the people doing these demonstrations were more honest.

It had access to GCC! Not only that, using GCC as an oracle was critical and had to be built in by hand.

Like the web browser project this shows how far you can get when you have a reference implementation, good benchmarks, and clear metrics. But that's not the real world for 99% of people, this is the easiest scenario for any ML setting.

rvz 15 hours ago|

> This is absolutely false and I wish the people doing these demonstrations were more honest.

That's because the "testing" was not done independently. So anything can be possibly be made to be misleading. Hence:

> Written by Nicholas Carlini, a researcher on our Safeguards team.

gre 18 hours ago||

There's a terrible bug where once it compacts then it sometimes pulls in .o or binary files and immediately fills your entire context. Then it compacts again...10m and your token budget is gone for the 5 hour period. edit: hooks that prevent it from reading binary files can't prevent this.

Please fix.. :)

pshirshov 14 hours ago||

Pfft, a C compiler.

Look at this: https://github.com/7mind/jopa

Havoc 16 hours ago||

Cool project, but they really could have skipped the mention of clean room. Something trained on every copyrighted thing known to mankind is the opposite of clean room

cheema33 15 hours ago||

As others have pointed out, humans train on existing codebases as well. And then use that knowledge to build clean room implementations.

mxey 15 hours ago|||

That’s the opposite of clean-room. The whole point of clean-room design is that you have your software written by people who have not looked into the competing, existing implementation, to prevent any claim of plagiarism.

“Typically, a clean-room design is done by having someone examine the system to be reimplemented and having this person write a specification. This specification is then reviewed by a lawyer to ensure that no copyrighted material is included. The specification is then implemented by a team with no connection to the original examiners.”

HarHarVeryFunny 12 hours ago||||

True, but the human isn't allowed to bring 1TB of compressed data pertaining to what they are "redesigning from scratch/memory" into the clean room.

In fact the idea of a "clean room" implementation is that all you have to go on is the interface spec of what you are trying to build a clean (non-copyright violating) version of - e.g. IBM PC BIOS API interface.

You can't have previously read the IBM PC BIOS source code, then claim to have created a "clean room" clone!

kelnos 14 hours ago||||

No they don't. One team meticulously documents and specs out what the original code does, and then a completely independent team, who has never seen the original source code, implements it.

Otherwise it's not clean-room, it's plagiarism.

regularfry 15 hours ago||||

What they don't do is read the product they're clean-rooming. That's kinda disqualifying. Impossible to know if the GCC source is in 4.6's training set but it would be kinda weird if it wasn't.

pizlonator 15 hours ago||||

Not the same.

I have read nowhere near as much code (or anything) as what Claude has to read to get to where it is.

And I can write an optimizing compiler that isn't slower than GCC -O0

cermicelli 15 hours ago|||

If that's what clean room means to you, I do know AI can definitely replace you. As even ChatGPT is better than that.

(prompt: what does a clean room implementation mean?)

From ChatGPT without login BTW!

> A clean room implementation is a way of building something (usually software) without copying or being influenced by the original implementation, so you avoid copyright or IP issues.

> The core idea is separation.

> Here’s how it usually works:

> The basic setup

> Two teams (or two roles):

> Specification team (the “dirty room”)

> Looks at the original product, code, or behavior

> Documents what it does, not how it does it

> Produces specs, interfaces, test cases, and behavior descriptions

> Implementation team (the “clean room”)

> Never sees the original code

> Only reads the specs

> Writes a brand-new implementation from scratch

> Because the clean team never touches the original code, their work is considered independently created, even if the behavior matches.

> Why people do this

> Reverse-engineering legally

> Avoid copyright infringement

> Reimplement proprietary systems

> Create open-source replacements

> Build compatible software (file formats, APIs, protocols)

I really am starting to think we have achieved AGI. > Average (G)Human Intelligence

LMAO

benjiro 15 hours ago||

Hot take:

If you try to reimplement something in a clean room, its a step by step process, using your own accumulated knowledge as the basis. That knowledge that you hold in your brain, all too often is code that may have copyrights on it, from the companies you worked on.

Is it any different for a LLM?

The fact that the LLM is trained on more data, does not change that when you work for a company, leave it, take that accumulated knowledge to a different company, you are by definition taking that knowledge (that may be copyrighted) and implementing it somewhere else. It only a issue if you copy the code directly, or do the implementation as a 1:1 copy. LLMs do not make 1:1 copies of the original.

At what point is trained on copyrighted data, any different then a human trained on copyrighted data, that get reimplemented in a transformative way. The big difference is that the LLM can hold more data over more fields, vs a human, true... But if we look at specializations, this can come back to the same, no?

Crestwave 12 hours ago|||

Clean-room design is extremely specific. Anyone who has so much as glanced at Windows source code[1] (or even ReactOS code![2]) is permanently banned from contributing to WINE.

This is 100% unambiguously not clean-room unless they can somehow prove it was never trained on any C compiler code (which they can't, because it most certainly was).

[1] https://gitlab.winehq.org/wine/wine/-/wikis/Developer-FAQ#wh...

[2] https://gitlab.winehq.org/wine/wine/-/wikis/Clean-Room-Guide...

cermicelli 15 hours ago||||

If you have worked on a related copyrighted work you can't work on a clean room implementation. You will be sued. There are lots of people who have tried and found out.

They weren't trillion dollar AI companies to bankroll the defense sure. But thinking about clean room and using copyrighted stuff is not even an argument that's just nonsense to try to prove something when no one asked.

dmitrygr 18 hours ago||

> The generated code is not very efficient. Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.

Worse than "-O0" takes skill...

So then, it produced something much worse than tcc (which is better than gcc -O0), an equivalent of which one man can produce in under two weeks. So even all those tokens and dollars did not equal one man's week of work.

Except the one man might explain such arbitrary and shitty code as this:

https://github.com/anthropics/claudes-c-compiler/blob/main/s...

why x9? who knows?!

Oh god the more i look at this code the happier I get. I can already feel the contracts coming to fix LLM slop like this when any company who takes this seriously needs it maintained and cannot...

ben_w 17 hours ago||

I'm trying to recall a quote. Some war where all defeats were censored in the news, possibly Paris was losing to someone. It was something along the lines of "I can't help but notice how our great victories keep getting closer to home".

Last year I tried using an LLM to make a joke language, I couldn't even compile the compiler the source code was so bad. Before Christmas, same joke language, a previous version of Claude gave me something that worked. I wouldn't call it "good", it was a joke language, but it did work.

So it sucks at writing a compiler? Yay. The gloriously indefatigable human mind wins another battle against the mediocre AI, but I can't help but notice how the battles keep getting closer to home.

sjsjsbsh 17 hours ago||

> but I can't help but notice how the battles keep getting closer to home

This has been true for all of (known) human history. I’m gonna go ahead and make another bold prediction: tech will keep getting better.

The issue with this blog post is it’s mostly marketing.

sebzim4500 18 hours ago|||

Can one man really make a C compiler in one week that can compile linux, sqlite, etc.?

Maybe I'm underestimating the simplicity of the C language, but that doesn't sound very plausible to me.

dmitrygr 18 hours ago||

yes, if you do not care to optimize, yes. source: done it

Philpax 18 hours ago||

I would love to see the commit log on this.

rustystump 17 hours ago|||

Implementing just enough to conform to a language is not as difficult as it seems. Making it fast is hard.

dmitrygr 18 hours ago|||

did this before i knew how to git, back in college. target was ARMv5

Philpax 17 hours ago||

Great. Did your compiler support three different architectures (four, if you include x86 in addition to x86-64) and compile and pass the test suite for all of this software?

> Projects that compile and pass their test suites include PostgreSQL (all 237 regression tests), SQLite, QuickJS, zlib, Lua, libsodium, libpng, jq, libjpeg-turbo, mbedTLS, libuv, Redis, libffi, musl, TCC, and DOOM — all using the fully standalone assembler and linker with no external toolchain. Over 150 additional projects have also been built successfully, including FFmpeg (all 7331 FATE checkasm tests on x86-64 and AArch64), GNU coreutils, Busybox, CPython, QEMU, and LuaJIT.

Writing a C compiler is not that difficult, I agree. Writing a C compiler that can compile a significant amount of real software across multiple architectures? That's significantly more non-trivial.

AshamedCaptain 2 hours ago||

Frankly, I think you are exaggerating. My university had a course that required students to build a C compiler that could run the C subset of SPECint (which includes frigging Perl) and this was the usual 3 month class that was not expected to fill in 24h of your time, so I'd say 1 week sounds perfectly reasonable for someone already familiar. Good enough C for a shitton of projects is barely more complicated than writing an assembler, in fact, that is one of C's strong points (which is also the source of most of its weaknesses).

bwfan123 15 hours ago|||

> I can already feel the contracts coming to fix LLM slop

First, the agents will attempt to fix issues on their own. Most easy problems will be fixed or worked-around in this manner. The hard problems will require a deeper causal model of how things work. For these, the agents will give up. But, the code-base has evolved to a point where no-one understands whats going on including the agents and its human handlers. Expect your phone to ring at that point, and prepare to ask for a ransom.

small_model 18 hours ago|||

Claude is only a few years old so we should compare it to a 3 year old human's C compiler

notnullorvoid 12 hours ago|||

Claude requires many lifetimes worth of data to "learn". Evolution aside humans don't require much data to learn, and our learning happens in real-time in response to our environment.

Train Claude without the programming dataset and give it a dozen of the best programming books, it'll have no chance of writing a compiler. Do the same for a human with an interest in learning to program and there's a good chance.

zephen 17 hours ago|||

Claude contains the entire wisdom of the internet, such as it is.

sjsjsbsh 18 hours ago||

> I can already feel the contracts coming to fix LLM slop like this when any company who takes this seriously needs it maintained and cannot

Honest question, do you think it’d be easier to fix or rewrite from scratch? With domains I’m intimately familiar with, I’ve come very close to simply throwing the LLM code out after using it to establish some key test cases.

dmitrygr 17 hours ago||

Rewrite is what I’ve been doing so far in such cases. Takes fewer hours

myduck_hacker 7 hours ago|

[dead]

More comments...