Posted by modeless 13 hours ago
its funny bacause by (most) definitions, it is not an artifact:
> a usually simple object (such as a tool or ornament) showing human workmanship or modification as distinguished from a natural object
I need to reunderwrite what my vision of the future looks like.
It's very telling how all these examples are all "look, we made it recreate a shitter version of a thing that already exists in the training set".
Without enough examples to copy from (despite CPU manuals being available in the training set) the approach failed. I wonder how well it'll do when you throw it a new/imaginary instruction set/CPU architecture; I bet it'll fail in similar ways.
And it's a bit of a nasty optimization problem, because the result is all or nothing. Implementing enough optimizations to get from 60kB to 33kB is useless, all the rewards come from getting to 32kB.
If the model were retrained without any of the existing compilers/toolchains in its training set, and it could still do something like this, that would be very compelling to me.
https://github.com/jyn514/saltwater
> https://github.com/jyn514/saltwater
This is just a frontend. It uses Cranelift as the backend. It's missing some fairly basic language features like bitfields and variadic functions. And if I'm reading the documentation right, it requires all the source code to be in a single file...
> https://github.com/ClementTsang/rustcc
This will compile basically no real-world code. The only supported data type is "int".
> https://github.com/maekawatoshiki/rucc
This is just a frontend. It uses LLVM as the backend.
I completely agree that "reweite this existing codebase into a new language" could be a very powerful tool. But the article is making much bolder claims. And the result was more limited in capability, so you can't even really claim they've achieved the rewrite skill yet.
Maybe read the article before being so dismissive.
You can force the agent not to use unsafe, this is why it burned $20000. Thousands of attempts against good tests with good boundaries set.
If you trained on a neutral representation like an AST or IR, then the source language shouldn't matter. *
* I'm not familiar with how Anthropic builds their models, but training this way should nullify PL differences.
|Over nearly 2,000 Claude Code sessions and $20,000 in API cost
I guess it makes as agents can generate tests, since you are taking this route I'd like to see agents that act as a users, that can only access docs, textbooks, user forums and builds.
Yes this is cool. I actually have worked on a similar project with a slightly worse test oracle and would gladly never have to do that sort of work again. Just tedious unfulfilling work. Though we caught issues with both the specifications/test oracle when doing the work. Also many of the team members learned and are now SMEs for related systems.
Is this evidence that knowledge work is dead or AGI is coming? Absolutely not. I think you’d be pretty ignorant with respect to the field to suggest such a thing.