Top
Best
New

Posted by pavel_lishin 1/23/2026

Gas Town's agent patterns, design bottlenecks, and vibecoding at scale(maggieappleton.com)
403 points | 434 commentspage 2
divbzero 1/23/2026|
My instinct is that effective AI agent orchestration will resemble human agile software development more than Steve Yegge’s formulation:

> “It will be like kubernetes, but for agents,” I said.

> “It will have to have multiple levels of agents supervising other agents,” I said.

> “It will have a Merge Queue,” I said.

> “It will orchestrate workflows,” I said.

> “It will have plugins and quality gates,” I said.

More “agile for agents” than “Kubernetes for agents”.

1970-01-01 1/23/2026||
If it's stupid, but it works, it isn't stupid. Gas town transcends stupid. It is an abstract garbage generator. Call it art, call it an experiment, but you cannot call it a solution to a problem by any definition of the word.
kibwen 1/23/2026||
"If it's stupid, but it works, it isn't stupid" is a maxim that only applies to luxury use cases where the results fundamentally don't matter.

As soon as the results actually matter, the maxim becomes "if it works, but it's stupid, it doesn't work".

shermantanktop 1/23/2026||
I just got some medication yesterday where the leaflet included the following phrase: "the exact mechanism of efficacy is unknown."

So apparently the medical field is not above this logic.

3eb7988a1663 1/24/2026||
That is ignorance, not stupidity. If you take compound X and see improvement in Y, that is worthwhile, even if the mechanism is a blackbox.
aaa_aaa 1/23/2026||
It is simply because Mr. Yegge is seeking attention. As he always did.
Alice4788 4 days ago||
Have you gotten your bitcoins stolen from your wallet or invested in a fake investment platform, you are not alone because this happened to me too. I initially lost $145,000 in just three months. I contacted the authorities and they referred me to a recovery company who helped me recover all my funds within 2 days. I’m speaking up to improve awareness of these cryptocurrency thieves and help as much as i can to reduce victims to the minimum. If you have been a victim, Simply contact them on coinhackrecovery @gmailcom for a solution if you need help.
wordswords2 1/24/2026||
There is nothing professional, analytical or scientific about Gas Town at all.

He is just making up a fantasy world where his elves run in specific patterns to please him.

There is no metrics or statistics on code quality, bugs produced, feature requirements met.. or anything.

Just a gigantic wank session really.

edg5000 1/24/2026|
Are you being sarcastic or serious? Meeting requirements is implicitly part of any task. Quality/quantification will be embedded in the tasks (e.g. X must be Y <unit>); code style and quality guidelines are probably there somewhere in his tasks templates. Implicitly, explicit portions of tasks will be covered by testing.

I do think it's overly complex though; but it's a novel concept.

63stack 1/24/2026|||
Everything you said is also done for regular non-ai development, OP is saying there is no way to compare the two (or even compare version x of gas town to version y of gas town) because there are 0 statistics or metrics on what gas town produces.
walthamstow 1/24/2026||
It's 3 weeks old. If you're so desperate for numbers, give it a go?
pydry 1/24/2026|||
>Are you being sarcastic or serious?

I think if you'd read the article through you'd know they were serious coz Yegge all but admits this himself.

ozozozd 1/24/2026||
Very interesting to read people’s belief in English as an unambiguous and testable language.

One comment claims it’s not necessary to read code when there is documentation (generated by an LLM)

Language varies with geography and with time. British, Americans, and Canadians speak “similar” English, but not identical.

And read a book from 70-80 years ago to see that many words appear to be used for their “secondary meaning.” Of course, what we consider their secondary meaning today was the primary meaning back then.

walthamstow 1/24/2026|
As a coworker said this week, there are 10 meanings of the word 'fashion' in English
phaedrus 1/24/2026||
If we had super-smart AI with low latency and fast enough speed, would the perceived need for / usefulness of running multiple agents evaporate? Sure you might want to start working on the prompt or user story for something else while the agent is working on the first thing, but - in my thought experiment here there wouldn't be a "while" because it'd already be done while you're moving your hand off the enter key.
fulafel 1/24/2026||
If they are interacting with the the world and tools like web research, compiles, deploys, end2end test runs etc, then no.

(Maybe you can argue that you could then do everything with a event-driven single agent, like async for llms, if you don't mind having a single very adhd context)

Descon 1/24/2026||
But maybe this is how a super smart AI works (or at least a prototype of one)
phren0logy 1/23/2026||
Gas Town has a very clear "mad scientist/performance art" sort of thing going on, and I love that. It's taking a premise way past its logical conclusion, and I think that's fun to watch.

I haven't seen anything to suggest that Yegge is proposing it as a serious tool for serious work, so why all the hate?

skywhopper 1/23/2026||
It’s doesn’t matter what Yegge means by it. Other folks are taking it seriously.
muixoozie 1/24/2026||
First time hearing about this tool and person. Just looked for a youtube video about it and he was recently interviewed and sounds very serious / bullish on this agentic stuff. I mean he's saying stuff like if you're still using IDEs you're a bad engineer. Basically you're 10x slower than people good at agenic coding. HR going to be looking for reasons for fire these dinosaurs. I'm paraphrasing, but not exaggerating. I mean it's shilling FOMO and his book. Whatever. I don't really care. I'm more concerned where things are headed.
bob1029 1/24/2026||
I'm beginning to question the notion that multi agent patterns don't work. I think there is something extra you get with a proposer-verifier style loop, even if both sides are using the same base model.

I've had very good success with a recursive sub agent scheme where a separate prompt (agent) is used to gate the recursive call. It compares the callers prompt with the proposed callee's prompt to determine if we are making a reasonable effort to reduce the problem into workable base cases. If the two prompts are identical we deny the request with an explanation. In practice, this works so well I can allow for unlimited depth and have zero fear of blowing the stack. Even if the verifier gets it wrong a few times, it only has to get it right once to reverse an infinite descent.

krackers 1/24/2026|
>I think there is something extra you get with a proposer-verifier style loop, even if both sides are using the same base model.

DeepSeekMath-V2 seems to show this, increasing the number of prover/verifier iterations gives increases accuracy. And this is with a model that has already undergone RL under a prover/verifier selection process.

However this type of subagent communication maintains full context, and is different from "breaking into tasks" style of sharding amongst subagents. I'm less convinced of the latter, because often times a problem is more complex than the sum of its parts, i.e. it's the interdependencies that make it complex and you need to consider each part in relation to the other parts, not in isolation.

bob1029 1/24/2026||
The specific way in which we invoke the subagents is critical to the performance of the system. If we use a true external call stack and force proper depth first recursion, the effective context can be maintained to whatever depth is desired.

Parallelism and BFS style approaches do not exhibit this property. Anything that happens within the context or token stream is a much weaker solution. Most agent frameworks are interested in appearance of speed, so they miss out on the nuance of this execution model.

martin-t 1/23/2026||
Anybody here read Coding machines?

There's this implied trust we all have in the AI companies that the models are either not sufficiently powerful to form a working takeover plan or that they're sufficiently aligned to not try. And maybe they genuinely try but my experience is that in the real world, nothing is certain. If it's not impossible, it will happen given enough time.

If the safety margin for preventing takeover is "we're 99.99999999 percent sure per 1M tokens", how long before it happens? I made up these numbers but any guess what they are really?

Because we're giving the models so much unsupervised compute...

rexpop 1/23/2026|
> If it's not impossible, it will happen given enough time.

I hope you might be somewhat relieved to consider that this is not so in an absolute sense. There are plenty of technological might-have-beens that didn't happen, and still haven't, and probably will never—due to various economic and social dynamics.

The counterfactual—all that's possible happens—ie almost tautological.

We should try and look at these mechanisms from an economic standpoint, and ask "do they really have the information-processing density to take significant long-term independent action?"

Of course, "significant" is my weasel word.

> we're giving the models so much unsupervised compute...

Didn't you read the article? It's wasted! It's kipple!

ramoz 1/23/2026|
I ran a similar operation over summer where I treated vibecoding like a war. I was the general. I had recon (planning), and frontmen/infantry making the changes. Bugs and poor design were the enemy. Planning docs were OPORD, we had sit reps, and after action reports - complete e2e workflow. Even had hooks for sounds and sprites. Was fun for a bit but regressed to simpler conceptual and more boring workflows.

Anyways we'll likely always settle on simpler/boring - but the game analogies are fun in the time being. A lot of opportunity to enhance UX around design, planning, and review.

More comments...