Posted by modeless 8 hours ago
> [...] The fix was to use GCC as an online known-good compiler oracle to compare against. I wrote a new test harness that randomly compiled most of the kernel using GCC, and only the remaining files with Claude's C Compiler. If the kernel worked, then the problem wasn’t in Claude’s subset of the files. If it broke, then it could further refine by re-compiling some of these files with GCC. This let each agent work in parallel
This is a remarkably creative solution! Nicely done.
I'll still work on it, of course. It just won't be so surprising.
- All prompts used
- The structure of the agent team (which agents / which roles)
- Any other material that went into the process
This would be a good source for learning, even though I'm not ready to spend 20k$ just for replicating the experiment.
GCC and Clang are worth much much more because they are battle-tested compilers that we understand and know work, even in a multitude of corner cases, over decades.
In future there's going to be lots and lots of basically worthless code, generated and regenerated over and over again. What will distinguish code that provides value? It's going to be code - however it was created, could be AI or human - that has actually been used and maintained in production for a long time, with a community or company behind it, bugs being triaged and fixed and so on.
If you had the knowledge that a transformer could pull this off in 2022. Even with all its flawed code. You would be floored.
Keep in mind that just a few years ago, the state of the art in what these LLMs could do was questions of this nature:
Suppose g(x) = f−1 (x), g(0) = 5, g(4) = 7, g(3) = 2, g(7) = 9, g(9) = 6 what is f(f(f(6)))?
The above is from the "sparks of AGI paper" on GPT-4, where they were floored that it could coherently reason through the 3 steps of inverting things (6 -> 9 -> 7 -> 4) while GPT 3.5 was still spitting out a nonsense argument of this form:
f(f(f(6))) = f(f(g(9))) = f(f(6)) = f(g(7)) = f(9).
This is from March 2023 and it was genuinely very surprising at the time that these pattern matching machines trained on next token prediction could do this. Something like a LSTM can't do anything like this at all btw, no where close.
To me its very surprising that the C compiler works. It takes a ton of effort to build such a thing. I can imagine the flaws actually do get better over the next year as we push the goalposts out.
First 10 commits, "git log --all --pretty=format:%s --reverse | head",
Initial commit: empty repo structure
Lock: initial compiler scaffold task
Initial compiler scaffold: full pipeline for x86-64, AArch64, RISC-V
Lock: implement array subscript and lvalue assignments
Implement array subscript, lvalue assignments, and short-circuit evaluation
Add idea: type-aware codegen for correct sized operations
Lock: type-aware codegen for correct sized operations
Implement type-aware codegen for correct sized operations
Lock: implement global variable support
Implement global variable support across all three backendsA whole lot of UB in the actual SIMD impls (who'd have expected), but that can actually be fine here if the compiler is made to not take advantage of the UB. And then there's the super-weird mix of manual loops vs inline assembly vs builtins.
IMO a simpler novel product that humans enjoy is 10x more impressive than rehashing a solved problem, regardless of difficulty.
How much would it cost to pay someone to make a C compiler in rust? A lot more than $20k
* massive meaning "total context needed" >> model context window
If you don't care about code quality, maintainability, readability, conformance to the specification, and performance of the compiler and of the compiled code, please, give me your $20,000, I'll give you your C compiler written from scratch :)
i don't know if you could. Let's say you get a check for $20k, how long will it take you to make an equivalent performing and compliant compiler? Are you going to put your life on pause until it's done for $20k? Who's going to pay your bills when the $20k is gone after 3 months?