Scaling long-running autonomous coding

Posted by samwillis 1/14/2026

Scaling long-running autonomous coding(cursor.com)

290 points | 197 commentspage 4

throwaway63467 1/15/2026|

I‘m running opus 4.5 which is arguably their best model and while it’s really good for a lot of work it always introduces subtle errors or inconsistencies when left unsupervised as prompts are never good enough to remove all ambiguity for complex asks, so I can’t imagine what it will do to a code base when left alone with it for days or weeks.

thesurlydev 1/15/2026||

Pretty cool and related to another path of work I'm following from Steve Yegge: https://medium.com/@steve-yegge/welcome-to-gas-town-4f25ee16...

laszlojamf 1/15/2026||

They mention billions of tokens, but I'm left wondering how much this experiment actually cost them...

mccoyb 1/14/2026||

Supposing agents and their organization improve, it seems like we’re approaching a point where the cost of a piece of software will be driven down to the cost of running the hardware, and the cost of the tokens required to replicate it.

The tokens were “expensive” from the minds of humans …

Daishiman 1/14/2026|

It will be driven down to the cost of having a good project and product manager effectively understanding what the customer wants, which has been the main barrier to excellent software for a good long time.

galaxyLogic 1/14/2026||

And not only understanding what the customer wants, but communicating that unambiguously to the AI. And note who is the "customer" here? Is it the end-users, or is it a client-company which contracts the project-manager for this task? But then the issue is still there, who in the client-company decides exactly what is needed and what the (potential) users want?

I think this situation emphasizes the importance of (something like) Agile. To produce something useful can only happen via experimentation and getting feedback from actual users, and re-iterating relentlessly.

reactordev 1/15/2026||

The planner worker architecture works well for me. About 3 layers is the sweet spot. From prompt -> plan -> task division -> workers.

Sometimes workers will task other workers and act as a planner if the task is more complex.

It’s a good setup but it’s nothing like Claude Code.

kilroy123 1/15/2026||

My test for whether we've created an AGI like AI? Build a Linux kernel from scratch that can actually run a full OS on your computer.

But, if I'm being fair, a full working browser from scratch is just as good.

foota 1/15/2026||

I've always liked the idea of intelligence in the autonomous ships of the Revelation Space universe. Little agents reporting to progressively more intelligent and higher level ones.

satvikpendem 1/15/2026|

That's essentially all life from the sub-cellular level on up

Havoc 1/15/2026||

> long running

I really dislike this as a measure. A LLM on CPU is also long running cause it’s slow.

I get what it’s meant to convey but time is such a terrible measure of anything if tk/s isn’t static

sashank_1509 1/14/2026||

Can a browser expert please go through the code the agent wrote (skim it), and let us know how it is. Is it comparable to ladybird, or Servo, can it ever reach that capability soon?

krackers 1/15/2026||

I'm interested in this too. I was expecting just a chromium reskin, but it does seem to be at least something more than that. https://news.ycombinator.com/item?id=46625189 claims it uses Taffy for CSS layout but the docs also claim "Taffy for flex/grid, native for tables/block/inline"

polyglotfacto 1/21/2026|||

I've done this in the parallel post, see https://news.ycombinator.com/item?id=46705625 (and a couple of other replies in that thread)

TLDR; the code is not a valid POC but throw-away level quality that could never support a functioning web engine. It's actually very clear hallucinated AI BS, which is what you get when you don't have a human expert in the loop.

I actually like using AI, but only to save me the typing.

missingdays 1/15/2026||

You can start by trying to compile the project (spoiler: you can't)

dist-epoch 1/14/2026|

So, who is going to compile the browser and post the binaries so we can check it out? (in a sandbox/VM obviously)

missingdays 1/15/2026|

It doesn't compile

More comments...