Posted by modeless 7 hours ago
This LLM did it in (checks notes):
> Over nearly 2,000 Claude Code sessions and $20,000 in API costs
It may build, but does it boot (was also a significant and distinct next milestone)? (Also, will it blend?). Looks like yes!
> The 100,000-line compiler can build a bootable Linux 6.9 on x86, ARM, and RISC-V.
The next milestone is:
Is the generated code correct? The jury is still out on that one for production compilers. And then you have performance of generated code.
> The generated code is not very efficient. Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.
Still a really cool project!
There's some incredible source available code out there. Statistically, I think there's a LOT more not so great source available code out there, because the majority of output of seasoned/high skill developers is proprietary.
To me, a surprising portion of Claude 4.5 output definitely looks like student homework answers, because I think that's closer to the mean of the code population.
Even worse in many cases because they are so over engineered nobody understands how they work.
i'm guessing most of the gains we've seen recently are post training rather than pretraining.
But, I naively assume most orgs would opt out. I know some orgs have a proxy in place that will prevent certain proprietary code from passing through!
This makes me curious if, in the allow case, Anthropic is recording generated output, to maybe down-weight it if it's seen in the training data (or something similar)?
But I wonder how it would fare given a language specification for a non-existent non-trivial language and build a compiler for that instead?
It is standing on the shoulders of giants (all of the compilers of the past, built into it's training data... and the recent learnings about getting these agents to break up tasks) to get itself going. Still fairly impressive.
On a side-quest, I wonder where Anthropic is getting there power from. The whole energy debacle in the US at the moment probably means it made some CO2 in the process. Would be hard to avoid?
Does it really boot...?
They don't need 16b x86 support for the RISCV or ARM ports, so yes, but depends on what 'it' we're talking about here.
Also, FWIW, GCC doesn't directly assemble to machine code either; it shells out to GAS (GNU Assembler). This blog post calls it "GCC assembler and linker" but to be more precise the author should edit this to "GNU binutils assembler and linker." Even then GNU binutils contains two linkers (BFD and GOLD), or did they excise GOLD already (IIRC, there was some discussion a few years ago about it)?
I don't agree that all the claims are backed up by their own comments, which means that there's probably other places where it falls down.
Its... Misrepresentation.
Like Chicken is a Scheme compiler. But they're very up front that it depends on a C compiler.
Here, they wrote a C compiler that is at least sometimes reliant on having a different C compiler around. So is the project at 50%? 75%?
Even if its 99%, thats not the same story as they tried to write. And if they wrote that tale instead, it would be more impressive, rather than "There's some holes. How many?"
The compiler itself is entirely functional; it just can't generate code optimal enough to fit within the constraints for that very specific (tiny!) part of the system, so another compiler is required to do that step.
They can also be made to be deterministic. Some extra care is required to avoid computation paths that lead to numerical differences on different machines, but this can be accomplished reliably with small models that use integer math and use kernels that follow a specific order of operations. You get a lot more freedom to do these things on the small, application-specific models than you do when you're trying to run a big LLM across different GPU implementations in floating point.
EDIT (since HN is preventing me from responding):
> Some people care more about compiler speed than the correctness?
Yeah, I think plenty of people writing code in languages that have concepts like Undefined Behavior technically don't really care as much about correctness as they may claim otherwise, as it's pretty hard to write large volumes of code without indirectly relying on UB somewhere. What is correct in such case was left up to interpretation of the implementer by ISO WG14.
Anyway, please define: "correctness".
Is this true? It’s not an everyday thing, but when using less common flags, or code structures, or targets… every few years I run into a codegen issue. It’s hard to imagine going through a career without a handful…
They found a bimodal distribution in failures over the lifetime of chips. Infant mortality was well understood. Silicon aging over time was much less well understood, and I still find surprising.
Yeah. This test sorta definitely proves that AI is legit. Despite the millions of people still insisting it's a hoax.
The fact that the optimizations aren't as good as the 40 year gcc project? Eh - I think people who focus on that are probably still in some serious denial.
It cost $20,000 and it worked, but it's also totally possible to spend $20,000 and have Claude shit out a pile of nonsense. You won't know until you've finished spending the money whether it will fail or not. Anthropic doesn't sell a contract that says "We'll only bill you if it works" like you can get from a bunch of humans.
Do catastrophic bugs exist in that code? Who knows, it's 100,000 lines, it'll take a while to review.
On top of that, Anthropic is losing money on it.
All of those things combined, viability remains a serious question.
I'm curious - do you have ANY idea what it costs to have humans write 100,000 lines of code???
You should look it up. :)
It seems they are *not* losing money on inference: https://bsky.app/profile/steveklabnik.com/post/3mdirf7tj5s2e
This has got to be my favorite one of them all that keeps coming up in too many comments… You know who also was losing money in the beginning?! every successful company that ever existed! some like Uber were losing billions for a decade. and when was the last time you rode in a taxi? (I still do, my kid never will). not sure how old you are and if you remember “facebook will never be able to monetize on mobile…” - they all lose money, until they do not
with all optimizations disabled:
> Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.
make O=/tmp/linux/x86 ARCH=x86_64 CC=/tmp/p/claudes-c-compiler/target/release/ccc -j30 defconfig all
``` /home/ray/Dev/linux/arch/x86/include/asm/preempt.h:44:184: error: expected ';' after expression before 'pto_tmp__' do { u32 pto_val__ = ((u32)(((unsigned long) ~0x80000000) & 0xffffffff)); if (0) { __typeof_unqual__((__preempt_count)) pto_tmp__; pto_tmp__ = (~0x80000000); (void)pto_tmp__; } asm ("and" "l " "%[val], " "%" "[var]" : [var] "+m" (((__preempt_count))) : [val] "ri" (pto_val__)); } while (0); ^~~~~~~~~ fix-it hint: insert ';' /home/ray/Dev/linux/arch/x86/include/asm/preempt.h:49:183: error: expected ';' after expression before 'pto_tmp__' do { u32 pto_val__ = ((u32)(((unsigned long) 0x80000000) & 0xffffffff)); if (0) { __typeof_unqual__((__preempt_count)) pto_tmp__; pto_tmp__ = (0x80000000); (void)pto_tmp__; } asm ("or" "l " "%[val], " "%" "[var]" : [var] "+m" (((__preempt_count))) : [val] "ri" (pto_val__)); } while (0); ^~~~~~~~~ fix-it hint: insert ';' /home/ray/Dev/linux/arch/x86/include/asm/preempt.h:61:212: error: expected ';' after expression before 'pao_tmp__' ```
Fixing some UB in the kernel sources, lots of plumbing to the build system (particularly making it more hermetic).
Getting the rest of the LLVM binutils substitutes to work in place of GNU binutils was also challenging. Rewriting a fair amount of 32b ARM assembler to be "unified syntax" in the kernel. Linker bugs are hard to debug. Kernel boot failures are hard to debug (thank god for QEMU+gdb protocol). Lots of people worked on many different parts here, not just me.
Evangelism and convincing upstream kernel developers why clang support was worth anyones while.
https://github.com/ClangBuiltLinux/linux/issues for a good historical perspective. https://github.com/ClangBuiltLinux/linux/wiki/Talks,-Present... for talks on the subject. Keynoting LLVM conf was a personal highlight (https://www.youtube.com/watch?v=6l4DtR5exwo).
I had hoped one day to re-implement parts of LLVM itself in Rust; in particular, I've been curious if we can concurrently compile C (and parse C in parallel, or lazily) that haven't been explored in LLVM, and I think might be safer to do in Rust. I don't know enough about grammers to know if it's technically impossible, but a healthy dose of ignorance can sometimes lead to breakthroughs.
LLVM is pretty well designed for test. I was able to implement a lexer for C in Rust that could lex the Linux kernel, and use clang to cross check my implementation (I would compare my interpretation of the token stream against clang's). Just having a standard module system makes having reusable pieces seems like perhaps a better way to compose a toolchain, but maybe folks with more experience with rustc have scars to disagree?
Heh, earlier this day, I was just thinking how crazy a proposal would it actually be to have a Rust dependency (specifically, the egg crate, since one of the things I'm banging my head against right now might be better solved with egraphs).
Granted, marketing sucks up far too much money for any startup, and again, we don't know the actual numbers in play, however, this is something to keep in mind. (The very same marketing that likely also wrote the blog post, FWIW).
but regardless, hiring is difficult and high-end talent is limited. If the costs were anywhere close to equivalent, the agents are a no-brainer
Or do you mean that if an external user replicated this experience they might get billed less than $20k due to CC being sold at lower rates than per-API-call metered billing?
> This was a clean-room implementation (Claude did not have internet access at any point during its development); it depends only on the Rust standard library. The 100,000-line compiler can build Linux 6.9 on x86, ARM, and RISC-V. It can also compile QEMU, FFmpeg, SQlite, postgres, redis
> I started by drafting what I wanted: a from-scratch optimizing compiler with no dependencies, GCC-compatible, able to compile the Linux kernel, and designed to support multiple backends. While I specified some aspects of the design (e.g., that it should have an SSA IR to enable multiple optimization passes) I did not go into any detail on how to do so.
> Previous Opus 4 models were barely capable of producing a functional compiler. Opus 4.5 was the first to cross a threshold that allowed it to produce a functional compiler which could pass large test suites, but it was still incapable of compiling any real large projects.
And the very open points about limitations (and hacks, as cc loves hacks):
> It lacks the 16-bit x86 compiler that is necessary to boot [...] Opus was unable to implement a 16-bit x86 code generator needed to boot into 16-bit real mode. While the compiler can output correct 16-bit x86 via the 66/67 opcode prefixes, the resulting compiled output is over 60kb, far exceeding the 32k code limit enforced by Linux. Instead, Claude simply cheats here and calls out to GCC for this phase
> It does not have its own assembler and linker;
> Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.
Ending with a very down to earth take:
> The resulting compiler has nearly reached the limits of Opus’s abilities. I tried (hard!) to fix several of the above limitations but wasn’t fully successful. New features and bugfixes frequently broke existing functionality.
All in all, I'd say it's a cool little experiment, impressive even with the limitations, and a good test-case as the author says "The resulting compiler has nearly reached the limits of Opus’s abilities". Yeah, that's fair, but still highly imrpessive IMO.
This is really pushing it, considering it’s trained on… internet, with all available c compilers. The work is already impressive enough, no need for such misleading statements.
It's not a clean-room implementation because of this:
> The fix was to use GCC as an online known-good compiler oracle to compare against
I agree that having a reference compiler available is a huge caveat though. Even if we completely put training data leakage aside, they're developing against a programmatic checker for a spec that's already had millions of man hours put into it. This is an optimal scenario for agentic coding, but the vast majority of problems that people will want to tackle with agentic coding are not going to look like that.
I'd argue that no one would really care given it's GCC.
But if you worked for GiantSodaCo on their secret recipe under NDA, then create a new soda company 15 years later that tastes suspiciously similar to GiantSodaCo, you'd probably have legal issues. It would be hard to argue that you weren't using proprietary knowledge in that case.
Check out the paper above on Absolute Zero. Language models don’t just repeat code they’ve seen. They can learn to code give the right training environment.
It's all but a clean-room design. A clean-room design is a very well defined term: "Clean-room design (also known as the Chinese wall technique) is the method of copying a design by reverse engineering and then recreating it without infringing any of the copyrights associated with the original design."
https://en.wikipedia.org/wiki/Clean-room_design
The "without infringing any of the copyrights" contains "any".
We know for a fact that models are extremely good at storing information with the highest compression rate ever achieved. It's not because it's typically decompressing that information in a lossy way that it didn't use that information in the first place.
Note that I'm not saying all AIs do is simply compress/decompress information. I'm saying that, as commenters noted in this thread, when a model was caught spotting out Harry Potter verbatim, there is information being stored.
It's not a clean-room design, plain and simple.
It is a research topic for heaven's sake:
https://arxiv.org/pdf/2601.02671
> For Claude 3.7 Sonnet, we were able to extract four whole books near-verbatim, including two books under copyright in the U.S.: Harry Potter and the Sorcerer’s Stone and 1984 (Section 4).
They used a lot of different techniques to prompt with actual text from the book, then asked the LLM to continue the sentences. I only skimmed the paper but it looks like there was a lot of iteration and repetitive trials. If the LLM successfully guessed words that followed their seed, they counted that as "extraction". They had to put in a lot of the actual text to get any words back out, though. The LLM was following the style and clues in the text.
You can't literally get an LLM to give you books verbatim. These techniques always involve a lot of prompting and continuation games.
The lesson here is that the Internet compresses pretty well.
A frontier model (e.g. latest Gemini, Gpt) is likely several-to-many times larger than 500GB. Even Deepseek v3 was around 700GB.
But your overall point still stands, regardless.
The distinction may not have mattered for copyright laws if things had gone down differently, but the gap between "blurry JPEG of the internet" and "learned stuff" is more obviously important when it comes to e.g. "can it make a working compiler?"
It is enough to have read even parts of a work for something to be considered a derivative.
I would also argue that language models who need gargantuan amounts of training material in order to work by definition can only output derivative works.
It does not help that certain people in this thread (not you) edit their comments to backpedal and make the followup comments look illogical, but that is in line with their sleazy post-LLM behavior.
For IP rights, I'll buy that. Not as important when the question is capabilities.
> I would also argue that language models who need gargantuan amounts of training material in order to work by definition can only output derivative works.
For similar reasons, I'm not going to argue against anyone saying that all machine learning today, doesn't count as "intelligent":
It is perfectly reasonable to define "intelligence" to be the inverse of how many examples are needed.
ML partially makes up for being (by this definition) thick as an algal bloom, by being stupid so fast it actually can read the whole internet.
https://arxiv.org/pdf/2601.02671
> For Claude 3.7 Sonnet, we were able to extract four whole books near-verbatim, including two books under copyright in the U.S.: Harry Potter and the Sorcerer’s Stone and 1984 (Section 4).
Edit: actually, no, I take that back, that's just very similar to some other research I was familiar with.
Because it _has_ been enough, that if you can recall things, that your implementation ends up not being "clean room", and trashed by the lawyers who get involved.
I mean... It's in the name.
> The term implies that the design team works in an environment that is "clean" or demonstrably uncontaminated by any knowledge of the proprietary techniques used by the competitor.
If it can recall... Then it is not a clean room implementation. Fin.
Citing a random arXiv paper from 2025 doesn't mean "they" used this technique. It was someone's paper that they uploaded to arXiv, which anyone can do.
It's kinda the whole point - you haven't read it so there's no doubt about copying in a clean-room experiment.
A "human style" clean-room copy here would have to be using a model trained on, say, all source code except GCC. Which would still probably work pretty well, IMO, since that's a pretty big universe still.
And at least with Moore's law, we had some understanding of the physical realities as transistors would get smaller and smaller, and reasonably predict when we'd start to hit limitations. With LLMs, we just have no idea. And that could be go either way.
Not from me you haven't!
> "they've hit a wall, no more data, running out of data, plateau this, saturated that"
Everyone thought Moore's Law was infallible too, right until they hit that bend. What hubris to think these AI models are different!
But you've probably been hearing that for 3 years too (though not from me).
> Models keep on getting better, at more broad tasks, and more useful by the month.
If you say so, I'll take your word for it.
Since then, all improvements came at a tradeoff, and there was a definite flattening of progress.
Idk, that sounds remarkably similar to these AI models to me.
I dunno. To me it doesn’t even look exponential any more. We are at most on the straight part of the incline.
SWE's may be seeing benefit. But in other areas? Doesnt seem to be the case. Consumers may use it as a more preferred interface for search - but this is a different discussion.
So where is that?
Prove this statement wrong.
Your post is phrased like it's a two sentence slam-dunk refutation of Anthropic's claims. I don't think it is, and I'm not even clear on what you're claiming precisely except that LLMs use knowledge acquired during training, which we all agree on here.
The fact that the implementation language for the compiler is rust doesn't factor into this. ML based natural language translation has proven that model training produces an abstract space of concepts internally that maps from and to different languages on the input and output side. All this points to is that there are different implicitly formed decoders for the same compressed data embedded in the LLM and the keyword rust in the input activates one specific to that programming language.
If all it takes is "trained on the Internet" and "decompress stored knowledge", then surely gpt3, 3.5, 4, 4.1, 4o, o1, o3, o4, 5, 5.1, 5.x should have been able to do it, right? Claude 2, 3, 4, 4.1, 4.5? Surely.
But reimplementing that isn't impressive, because its not a clean room implementation if you trained on that data, to make the model that regurgitates the effort.
Are you sure about that? Do you have some examples? The older Claude models can’t do it according to TFA.
Take the C4 training dataset for example. The uncompressed, uncleaned, size of the dataset is ~6TB, and contains an exhaustive English language scrape of the public internet from 2019. The cleaned (still uncompressed) dataset is significantly less than 1TB.
I could go on, but, I think it's already pretty obvious that 1TB is more than enough storage to represent a significant portion of the internet.
That seems implausible.
Why, exactly?
Refuting facts with "I doubt it, bro" isn't exactly a productive contribution to the conversation..
If you're testing a model I think it's reasonable that "clean room" have an exception for the model itself. They kept it offline and gave it a sandbox to avoid letting it find the answers for itself.
Yes the compression and storage happened during the training. Before it still didn't work; now it does much better.
If it could translate the C++ standard INTO an extensive test suite that actually captures most corner cases, and doesn't generate false positives - again, without internet access and without using gcc as an oracle, etc?
Except its trained on all source out there, so I assume on GCC and clang. I wonder how similar the code is to either.
The C specification and Linux kernel source code are undoubtedly in its training data, as are texts about compilers from a theoretical/educational perspective.
Meanwhile, I'm certain most people will never need it to perform this task. I would be more interested in seeing if it could add support for a new instruction set to LLVM, for example. Or perhaps write a complier for a new language that someone just invented, after writing a first draft of a spec for it.
Hello, this is what I did over my Christmas break. I've been taking some time to do other things, but plan on returning to it. But this absolutely works. Claude has written far more programs in my language than I have.
https://rue-lang.dev/ if you want to check it out. Spec and code are both linked there.
I ask because, as someone who uses these things every day, the idea that this kind of thing only works because of similar projects in the training data doesn't fit my mental model of how they work at all.
I'm wondering if the "it's in the training data" theorists are coding agent practitioners, or if they're mainly people who don't use the tools.
1. data analysis / visualization / …
2. “is this possible? can this even be done?”
for #1 - I don’t do much anymore, for #2 I mostly do it still all “by hand” not for the lack of serious trying. so “it can do #1 1000x better than me cause it is generally solved problem(s) it is trained on while it can’t effectively do #2 otherwise” fits perfectly
Why is this even desirable? I want my LLM to take into account everything there is out there and give me the best possible output.
My second reaction: still incredible, but noting that a C compiler is one of the most rigorously specified pieces of software out there. The spec is precise, the expected behavior is well-defined, and test cases are unambiguous.
I'm curious how well this translates to the kind of work most of us do day-to-day where requirements are fuzzy, many edge cases are discovered on the go, and what we want to build is a moving target.
/me Laughs in "unspecified behavior."
Unspecified is whatever you want it to mean. I am also laughing, having never heard "unspecified" before.
But a lot of programming we discover correctness as we go, one reason humans don’t completely exit the loop. We need to see and build tests as we go, giving them particular care and attention to ensure they test what matters.
All said and done, that its even possible is remarkable. Maybe these all go into training the next Opus or Sonnet and we start getting models that can create efficient compilers from scratch. That would be something!
There's been more than enough drive-by comments from new accounts/green names even in this HN submission alone.
Maybe the general population will be willing to have a more constructive discussions about this tech once the trillion dollar companies stop pillaging everything they see in front of them and cease acting like sociopaths whose only objectives seem to be concentrating power, generating dissidence and harvesting wealth.
“Typically, a clean-room design is done by having someone examine the system to be reimplemented and having this person write a specification. This specification is then reviewed by a lawyer to ensure that no copyrighted material is included. The specification is then implemented by a team with no connection to the original examiners.”
In fact the idea of a "clean room" implementation is that all you have to go on is the interface spec of what you are trying to build a clean (non-copyright violating) version of - e.g. IBM PC BIOS API interface.
You can't have previously read the IBM PC BIOS source code, then claim to have created a "clean room" clone!
Otherwise it's not clean-room, it's plagiarism.
I have read nowhere near as much code (or anything) as what Claude has to read to get to where it is.
And I can write an optimizing compiler that isn't slower than GCC -O0
(prompt: what does a clean room implementation mean?)
From ChatGPT without login BTW!
> A clean room implementation is a way of building something (usually software) without copying or being influenced by the original implementation, so you avoid copyright or IP issues.
> The core idea is separation.
> Here’s how it usually works:
> The basic setup
> Two teams (or two roles):
> Specification team (the “dirty room”)
> Looks at the original product, code, or behavior
> Documents what it does, not how it does it
> Produces specs, interfaces, test cases, and behavior descriptions
> Implementation team (the “clean room”)
> Never sees the original code
> Only reads the specs
> Writes a brand-new implementation from scratch
> Because the clean team never touches the original code, their work is considered independently created, even if the behavior matches.
> Why people do this
> Reverse-engineering legally
> Avoid copyright infringement
> Reimplement proprietary systems
> Create open-source replacements
> Build compatible software (file formats, APIs, protocols)
I really am starting to think we have achieved AGI. > Average (G)Human Intelligence
LMAO
If you try to reimplement something in a clean room, its a step by step process, using your own accumulated knowledge as the basis. That knowledge that you hold in your brain, all too often is code that may have copyrights on it, from the companies you worked on.
Is it any different for a LLM?
The fact that the LLM is trained on more data, does not change that when you work for a company, leave it, take that accumulated knowledge to a different company, you are by definition taking that knowledge (that may be copyrighted) and implementing it somewhere else. It only a issue if you copy the code directly, or do the implementation as a 1:1 copy. LLMs do not make 1:1 copies of the original.
At what point is trained on copyrighted data, any different then a human trained on copyrighted data, that get reimplemented in a transformative way. The big difference is that the LLM can hold more data over more fields, vs a human, true... But if we look at specializations, this can come back to the same, no?
This is 100% unambiguously not clean-room unless they can somehow prove it was never trained on any C compiler code (which they can't, because it most certainly was).
[1] https://gitlab.winehq.org/wine/wine/-/wikis/Developer-FAQ#wh...
[2] https://gitlab.winehq.org/wine/wine/-/wikis/Clean-Room-Guide...
They weren't trillion dollar AI companies to bankroll the defense sure. But thinking about clean room and using copyrighted stuff is not even an argument that's just nonsense to try to prove something when no one asked.
This is incredible!
But it also speaks to the limitations of these systems: while these agentic systems can do amazing things when automatically-evaluable, robust test suites exist... you hit diminishing returns when you, as a human orchestrator of agentic systems, are making business decisions as fast as the AI can bring them to your attention. And that assumes the AI isn't just making business assumptions with the same lack of context, compounded with motivation to seem self-reliant, that a non-goal-aligned human contractor would have.
As a particular near-term step, I imagine that it won't be long before we see a SaaS company using an AI product manager, which can spawn agents to directly interview users as they utilize the app, independently propose and (after getting approval) run small product experiments, and come up with validated recommendations for changing the product roadmap. I still remember Tay, and wouldn't give something like that the keys to the kingdom any time soon, but as long as there's a human decision maker at the end, I think that the tech is already here.
I know this is an impressive accomplishment and is meant to show us the future potential, but it achieves big results by throwing an insane amount of compute at the problem, brute forcing its way to functionality. $20,000 set on fire, at Claude's discounted Max pricing no less.
Linear results from exponential compute is not nothing, but this certain feels like a dead end approach. The frontier should be more complexity for less compute, not more complexity from an insane amount more compute.
I would interpret this as being at API pricing. At subscription pricing, it's probably at most 5 or 6 Max subscriptions worth.
To be fair, that's two weeks of the employer cost of a FAANG engineer's labor. And no human hacks a working compiler in two weeks.
It's a lot of AI compute for a demo, sure. But $20k stunts are hardly unique. Clearly there's value being demonstrated here.
It's very telling how all these examples are all "look, we made it recreate a shitter version of a thing that already exists in the training set".
Without enough examples to copy from (despite CPU manuals being available in the training set) the approach failed. I wonder how well it'll do when you throw it a new/imaginary instruction set/CPU architecture; I bet it'll fail in similar ways.
And it's a bit of a nasty optimization problem, because the result is all or nothing. Implementing enough optimizations to get from 60kB to 33kB is useless, all the rewards come from getting to 32kB.
If the model were retrained without any of the existing compilers/toolchains in its training set, and it could still do something like this, that would be very compelling to me.
https://github.com/jyn514/saltwater
> https://github.com/jyn514/saltwater
This is just a frontend. It uses Cranelift as the backend. It's missing some fairly basic language features like bitfields and variadic functions. And if I'm reading the documentation right, it requires all the source code to be in a single file...
> https://github.com/ClementTsang/rustcc
This will compile basically no real-world code. The only supported data type is "int".
> https://github.com/maekawatoshiki/rucc
This is just a frontend. It uses LLVM as the backend.
I completely agree that "reweite this existing codebase into a new language" could be a very powerful tool. But the article is making much bolder claims. And the result was more limited in capability, so you can't even really claim they've achieved the rewrite skill yet.
But when an AI does it, now it counts? Opus is trained on the source code of Clang, GCC, TCC, etc. So this is not "clean-room".
$ ./target/release/ccc-arm -I /usr/include/ -I /usr/local/include/ -I /usr/lib/gcc/aarch64-redhat-linux/15/include/ -o hello hello.c
$ ./hello
Hello from CCC!But yeah, either way it just needs to know where to find the stdlib.