Wasting Inferences with Aider

Posted by Stwerner 4/13/2025

Wasting Inferences with Aider(worksonmymachine.substack.com)

139 points | 105 commentspage 2

canterburry 4/13/2025|

I wouldn't be surprised if someone tries to leverage this with their customer feature request tool.

Imagine having your customers write feature requests for your saas, that immediately triggers code generation and a PR. A virtual environment with that PR is spun up and served to that customer for feedback and refinement. Loop until customer has implemented the feature they would like to see in your product.

Enterprise plan only, obviously.

aqme28 4/13/2025||

It's cute but I don't see the benefit. In my experience, if one LLM fails to solve a problem, the other ones won't be too different.

If you picked a problem where LLMs are good, now you have to review 3 PRs instead of just 1. If you picked a problem where they're bad, now you have 3 failures.

I think there are not many cases where throwing more attempts at the problem is useful.

emorning3 4/13/2025||

I see 'Waste Inferences' as a form of abductive reasoning.

I see LLMs as a form of inductive reasoning, and so I can see how WI could extend LLMs.

Also, I have no doubt that there are problems that can't be solved with just an LLM but would need abductive extensions.

Same comments apply to deductive (logical) extensions to LLMs.

namaria 4/14/2025|

> Also, I have no doubt that there are problems that can't be solved with just an LLM but would need abductive extensions.

And we're back to expert systems.

phamilton 4/13/2025||

Sincere question: Has anyone figured out how we're going to code review the output of an agent fleet?

jsheard 4/13/2025||

Insincere answer that will probably be attempted sincerely nonetheless: throw even more agents at the problem by having them do code review as well. The solution to problems caused by AI is always more AI.

regularfry 4/13/2025|||

Technically that's known as "LLM-as-judge" and it's all over the literature. The intuition would be that the capability to choose between two candidates doesn't exactly overlap with the ability to generate either one of them from scratch. It's a bit like how (half of) generative adversarial networks work.

brookst 4/13/2025|||

s/AI/tech

sensanaty 4/13/2025|||

Most of the people pushing this want to just sell an MVP and get a big exit before everything collapses, so code review is irrelevant.

lsllc 4/13/2025|||

Simple, just ask an(other) AI! But seriously, different models are better/worse at different tasks, so if you can figure out which model is best at evaluating changes, use that for the review.

phamilton 4/14/2025||

I suspect this will indeed be part of it, but it won't work with today's AIs on today's codebases.

Models will improve, but also I predict code style and architecture will evolve towards something easier for machine review.

nchmy 4/13/2025|||

sincere question: why would you not be able to code review it in the same way you would for humans?

phamilton 4/14/2025||

Agents could generate more PRs in a weekend than my team could code review in a month.

Initially we can absolutely just review them like any other PR, but at some point code review will be the bottleneck.

nchmy 4/18/2025||

Surely humans are the ones initiating the agent though, no? Just do that at a measured pace. And set up comprehensive prompts/mechanisms to make sure the agent satisfies your criteria for tests, style, etc - there's a lot of prompts and tools around the Cline/Roo community for doing stuff like that.

fxtentacle 4/13/2025||

You just don't. Choose randomly and then try to quickly sell the company. /s

precompute 4/13/2025||

Feels like a way to live with a bad decision rather than getting rid of it.

lherron 4/13/2025||

I love this! I have a similar automation for moving a feature through ideation/requirements/technical design, but I usually dump the result into Cursor for last mile and to save on inference. Seeing the cost analysis is eye opening.

There’s probably also some upside to running the same model multiple times. I find Sonnet will sometimes fail, I’ll roll back and try again with same prompt but clean context, and it will succeed.

ghuntley 4/13/2025|

re: cost analysis

There's something cooked about Windsurf/Cursors' go-to-market pricing - there's no way they are turning a profit at $50/month. $50/month gets you a happy meal experience. If you want more power, you gotta ditch snacking at McDonald’s.

In the future, companies should budget $100 USD to $500 USD per day, per dev, on tokens as the new normal for business, which is circa $25k USD (low end) to $50k USD (likely) to $127k USD (highest) per year.

Above from https://ghuntley.com/redlining/

This napkin math is based upon my current spend in bring a self-compiled compiler to life.

KTibow 4/13/2025||

I wonder if using thinking models would work better here. They generally have less variance and consider more options, which could achieve the same goal.

billmalarky 4/13/2025||

I've been lucky enough to have a few conversations with Scott a month or so ago and he is doing some really compelling work around the AISDLC and creating a factory line approach to building software. Seriously folks, I recommend following this guy closely.

There's another guy in this space I know who's doing similar incredible things but he doesn't really speak about it publicly so don't want to discuss w/o his permission. I'm happy to make an introduction for those interested just hmu (check my profile for how).

Really excited to see you on the FP of HN Scott!

evertedsphere 4/13/2025||

love to see "Why It Matters" turn into the heading equivalent to "delve" in body text (although different in that the latter is a legitimate word while the former is a "we need to talk about…"–level turn of phrase)

dimal 4/14/2025|

Makes me think of The Sorcerers Apprentice.

More comments...