Show HN: Auto-Architecture: Karpathy's Loop, pointed at a CPU

Posted by fesens 18 hours ago

Show HN: Auto-Architecture: Karpathy's Loop, pointed at a CPU(github.com)

169 points | 40 comments

pteetor 8 hours ago|

In case you are unfamiliar with Karpathy's Loop[1], it is a genetic algorithm[2] where the genetic "mutations" are clever-but-random ideas generated by an LLM agent, aimed at improving a system.

  (1) Let the LLM randomly perturbate the system.
  (2) Measure the system's performance.
  (3a) If the perturbation improved performance, keep the change.
  (3b) Otherwise, don't.
  (4) Repeat

[1] https://github.com/karpathy/autoresearch

[2] https://en.wikipedia.org/wiki/Genetic_algorithm

thald 3 hours ago||

I was working some time ago on LLM assisted optimizations and algorithm discovery and this does not look like a novel idea.

AlphaEvolve from google is evolutionary algorithm which uses LLMs for Idea generation following very similar loop:

- https://deepmind.google/blog/alphaevolve-a-gemini-powered-co...

- Open source implementation of the algorithm: https://github.com/algorithmicsuperintelligence/openevolve

amelius 42 minutes ago||

I mean, this is such low hanging fruit, you have to be careful not to step on it.

riedel 17 minutes ago||

Just because it is a nice meme I want to throw in Schmidhuber's work on (do not treat this comment as serious except you are Schmidhuber himself):

* Gödel Machine (2006-2007) [1]

* Optimal Ordered Problem Solver (2002) [2]

* Meta-Learning and Artificial Curiosity (1990s onward) [3]

[1] https://arxiv.org/html/2505.22954v3

[2] https://arxiv.org/abs/cs/0207097

[3] https://evolution.ml/pdf/schmidhuber.pdf

Edit: markdown formatting

amelius 56 minutes ago|||

A genetic algorithm keeps a population, and there is a "crossing" operation.

I don't see both ingredients in Karpathy's proposed scheme.

iterateoften 16 minutes ago|||

Lol, I respect karpathy a lot, but this is such an obvious in your face idea that it is laughable to put someone’s name on it.

What’s next “karpathy investing” where ai in a loop builds a portfolio?

embedding-shape 12 minutes ago||

I'd go a step further and say that sort of loop is probably the first thing most people who play around with agent harnesses try, pretty much the first "Hmm, what should I do now?" thing that pops into people's head.

faangguyindia 5 hours ago|||

i actually do it differently

> (1) Let the LLM randomly perturbate the system.

instead of this i ask LLM to what's least likely to improve performance and then measure it.

sometimes big gains come from places you thought are least likely.

yk09123 2 hours ago|||

This is like idiocracy for Software Devs at this point

2ndorderthought 12 minutes ago|||

It does burn holes in ones brain doesn't it... At least with the silly sorting algorithms we know they are supposed to be silly...

KronisLV 1 hour ago|||

Is it? Evolution also seems to be a result of semi-random crap over the span of millenia and nobody is critiquing it like that.

Why should throwing ideas at the wall in regards to optimizing code be any different: as long as you can measure and verify it, are okay with added complexity, and are capable of making the code itself not be crap by the end of it?

If an approach is found that improves how well something works, you can even treat the AI slop as a draft and iterate upon it yourself further.

2ndorderthought 9 minutes ago||

It's basically saying to randomly slop something and see if it gets better. Evolution has physical principles and guard rails backing it. Here there are no principals whatsoever, just slopping the slopper to see if it's somehow less sloppy.

bodegajed 2 hours ago|||

thanks, I thought as a researcher Kaparthy would include and cite relevant papers. I quickly became disappointed. I already knew openevolve and the ACE Framework paper. This is the first time I learned about Genetic Algorithm and I now have some clear roadmap for studying.

2001zhaozhao 6 hours ago||

Wtf, this has a name now? I thought of this exact idea literally months ago but never had the time to do any experiments on it.

At the time I dismissed it as potentially being incredibly expensive for the improvement you do get, and runs into typical pitfalls of evolutionary algorithms (in the same way evolution doesn't let an organism grow a wheel, your LLM evolution algorithm will never come up with something that requires a far bigger leap than what you allow the LLM to perturb on a single step. Also the genetic algorithm will probably result in a vibecoded mess of short-sighted decisions just like evolution creates a spaghetti genome in real life.)

I'll definitely need to look into how people have improved the idea and whether it is practical now.

beepdyboop 6 hours ago|||

This is not a new idea at all, many many have had it, no one really can claim it

utopiah 4 hours ago|||

Stigler's law of eponymy https://en.wikipedia.org/wiki/Stigler%27s_law_of_eponymy

janalsncm 4 hours ago||

Wikipedia has humor:

> The same observation had previously also been made by many others.

wiseowise 2 hours ago|||

Don’t worry, Twitter bros already coined it.

stingraycharles 5 hours ago||||

Genetic algorithms have existed since the 60s / 70s, e.g. computers learning to play a game. LLMs aren’t particularly guide at it.

I think hyperparameter tuning may actually be a kind of genetic algorithm.

janalsncm 4 hours ago||

Hyperparameter tuning could be done by genetic algorithm. I think it’s a bit of a category error to say that it is a genetic algorithm though.

Hyperparam tuning is usually done by Bayesian Optimization though.

stingraycharles 3 hours ago||

Yeah that’s correct, it could use it, but there are better alternatives for this particular problem.

naveen99 5 hours ago|||

You know this doesn’t work most of the time…

mordae 3 hours ago||

I just wish everyone read Summa Technologiae from Stanislaw Lem. This was obviously covered back in 1964, with the relevant implications.

https://publicityreform.github.io/findbyimage/readings/lem.p...

sho_hn 8 hours ago||

Salient on the value of the verifier. Matches my experience in the last two quarters.

Nice detail on the encountered failures. Very similar experiences with my own loops against testsuites.

Great post. A snapshot in time.

youwangd 2 hours ago|

[flagged]

fc417fc802 8 hours ago||

Extremely interesting but I don't understand why it was written by an LLM. Either the frontier models are far better than I realized or else writing this document required a lot of manual work regardless at which point why not keep it in your own voice?

> The agent did not know that would also halve the LUT count. It found out by doing it and watching the synthesizer.

So I guess this is an example of an LLM anthropomorphizing and making wild conjectures about the internal workings of a different LLM.

osti 7 hours ago||

> propose, implement, measure, keep the wins

Pretty much what I did to let Codex with gpt5.4xhigh improve my fairly complex CUDA kernel which resulted in 20x throughput improvement.

hackyhacky 7 hours ago|

Concretely, what interesting changes did it make to achieve such a significant improvement?

osti 5 hours ago||

A lot of it was beyond me, but this was all the branch names for all the stuff it tried, most of it unsuccessful of course. About 10x perf improvement came from architectural changes, and then 2x from micro optimizations.

https://pastebin.com/eac0SAYg

Havoc 4 hours ago||

Seems like this could be applied to many things. Database optimisation etc

koonsolo 1 hour ago||

> f you can write the rules down, an agent will satisfy them faster than your team will.

Big difference between a working model that needs to be optimized, vs nothing working at all.

outside1234 7 hours ago||

Has anyone actually written a verifier for a business / project?

sho_hn 7 hours ago||

I'd say "a verifier" here is a loose term. A great testsuite is a verifier. I've done reverse-engineering projects that involved generating trace logs from the object under test, having a reimplementation emit the same logs, and running strict comparisons.

OP's post is basically pointing out what certainly many others have independently discovered: Your agent-based dev operation is as good as the test rituals and guard rails you give the agents.

fy20 2 hours ago|||

I used it (well, a skill based on the same idea) to optimise a prompt that does data extraction from UGC.

However there isn't really a "correct" answer that's easy to define in code (I could manually label a training set, but wanted to avoid that) so I had the LLM just analyse the results itself and decide if they are better or not. It wrote deterministic rules for a few things, but overall it just reviewed the results of each round and decided if the are better or not.

Reviewing the before and after results, I would say yes, it's a big improvement in quality. It also optimised the prompt size to reduce input tokens by 25% and switched to a smaller/cheaper model.

dataviz1000 7 hours ago|||

Can you explain your question a little more? The recursive agents will find the minimum to satisfy the deterministic termination condition including cheating. In other words, it will be literally correct yet wrong. I would go so far to say malicious compliance.

I have recursive agent that finds trading strategies after recreating academic research and probing the model using its training on everything. It works really well but I have to force it to write out every line and write a proof that data in the future from the time of the wall clock didn't enter the system. Even then some stupid thing like not converting the timezone with daylight savings will allow it to peek into the future 1 hour. These types of bugs are almost impossible to find. Now there needs to be another agent whose only purpose to write out every line explaining that the timezone for that line of code was correct.

faeyanpiraat 4 hours ago||

Its tangential, but: I’m currently doing a rewrite of the backend of a project, and the verifier is basically the instruction of “maintain v1 functionality if observed from the api side externally”. This allows making a lot of tests based on existing data in the system and how the frontend expects data.

thin_carapace 7 hours ago||

> "If you can write the rules down, an agent will satisfy them faster than your team will."

a fantastic opportunity to become the next next big thing and write a verifier verifier.

at the hypothesized inflexion point where AI instantly performs exactly as commanded, what happens to heavily regulated industries like medical? do we get huge leaps and bounds everywhere EXCEPT where it matters, or is regulation going to be handed over to a verifier verifier?

_carbyau_ 7 hours ago||

> performs exactly as commanded

The devil is in the details. There are an amazing number of details in a good [thing]. Someone somewhere has to say exactly what this [thing] being built actually is.

Read almost any story about wishes from a genie. Simple statements don't work.

regularfry 3 hours ago|||

I'm medical-adjacent. We're still bottlenecked on human review and ops. Not because of regulation, we just don't trust the inputs or the outputs enough yet. But also, a lot of the things we'd want an agent to be have access to we don't trust humans with, either, often for stupid historical reasons.

DeathArrow 6 hours ago|

Is this related to autoresearch? https://github.com/karpathy/autoresearch

More comments...