When I reject AI code even if it works

Posted by vnbrs 10 hours ago

When I reject AI code even if it works(vinibrasil.com)

180 points | 99 commentspage 2

datadrivenangel 9 hours ago|

"The reality is that code that runs and makes the CI green can still be a bad solution, and engineering has always been about implementing adequate, scalable, and extensible solutions."

Adequate often means done and cheap

josephg 9 hours ago||

> Adequate often means done and cheap

It really, REALLY depends what you're working on. If you're throwing together an internal tool or simple dashboard, it doesn't really matter what the code looks like. But if you're writing software that other programs will depend on, bad design choices ripple out and affect another generation of software. Imagine slop in the linux kernel, in google chrome, or in your compiler or runtime. Its not acceptable.

I know a lot of people spend their careers writing end user software and web UIs. AI is increasingly a good choice for this sort of code. But that's not all of us. And its not all of the software being written.

DrewADesign 9 hours ago|||

As long as safe and stable are assumed to be base-level requirements… maybe?

solid_fuel 9 hours ago|||

Disagree, adequate means adequate. Done and cheap is what you call it when a solution is adequate. If the solution isn't adequate, it doesn't matter if it's cheap, because it isn't done.

skydhash 8 hours ago||

I was just watching a video about system engineering and the following stucks:

Stakeholder needs: What people wants to get done with the product

Management needs: How to manage the spending of resources (time, money,…) to create the product

Engineering needs: What is the product

You have to balance the three. Sometimes it’s simple and easy to get right. Sometimes it’s complex enough, you’re never truly sure until the product is out in the wild.

Software is malleable and we can do easily do iterations which is not possible with hardware. But today, we have a skew towards engineering, where the whole focus is to create a solution, whatever that is. No understanding of the problem, no proper allocation of resources, just do something. Even if it is plastering over the crack for the eleventh time.

littlecosmic 6 hours ago||

The stakeholders just want to send emails and excels around, someone in management has a budget for a productivity enhancing tool to replace that and the engineers have a half-baked solution that some sales guys are saying is the second coming.

moezd 4 hours ago||

If it's code that you can tolerate being somewhat messy and suboptimal, you can run agents e2e. If it's critical piece of code that has become part of your identity, better do the PR work and scrutinize it well. LLMs are still next token predictors, no matter how much harness, hooks, skills and tools is attached to them. LLMs will only know that these are callable, interpretating the state and mitigation are still best effort.

philbo 6 hours ago||

Yesterday I started working on an agent harness that tries to address some of the issues here.

What I'm hoping to build ultimately is something that works more like a pair-programming partner than existing harnesses do. I want the user to be an engaged part of the development process all the way through, I don't want the agent disappearing to work on its own. I even want to make it possible for users to swap into the driver role and have the LLM automatically assume the role of navigator when that happens.

There's more info in the readme (actually the readme is all that exists so far, I wanted to get the idea straight in my head first):

https://gitlab.com/philbooth/opair

Even if nobody else uses it, I hope it will be a useful tool for myself and help me find a way to work with LLMs that doesn't harm my mental models, which is what I feel current harnesses do.

julianlam 7 hours ago||

I think a particular failing with developers embracing AI is fighting the sunk cost fallacy. While you might not have spent as much time putting together a non-working solution, you still did spend time working with the agent to slap together a non-working solution.

Being able to step back and say "this was a failure and we need to discard the day's work and start over" is still hard with LLMs.

mkozlows 7 hours ago|

Completely disagree. I think this is one of the big wins of agentic engineering. When you look back at your own completed change and realize that you made it too complicated because your initial abstraction was wrong, you have to debate long and hard about whether it's worth going back and redoing the work -- is the abstraction actually that bad? Would you really get a huge win by changing it, enough to justify spending another day on the task?

But with the agent, you know that the change will be relatively quick and easy, so the bar to tell it to shift approaches is much, much lower.

piterrro 6 hours ago||

I feel the same way, reading AI built feature entire output makes me cognitively overloaded as well - I can only do so many throughout the day.

What I found myself doing is operating in two modes: 1. For projects that require my attention, I plan and instruct LLM, when needed will draft some code and ask agent to make it better or finish the mundane part (write code and leave gaps with comments asking agent to finish) 2. Full automode where I use spec driven development and TDD - I only ask for changes based on existing PRD, which agent also have to update. Here I do not look at the code at all.

Seems to be working just fine.

AmareshHebbar 8 hours ago||

If I can't explain the code without rereading the diff, I probably shouldn't merge it.

danfritz 3 hours ago||

This resonates a lot with me. I often use AI for the plan and let it propose multiple possible implementations, I often have to point out the glaring easier / logical solution.

When implementing its often a lot of misses with a few golden hits. The other day it used flex for a table layout while our app uses tables everywhere sigh.

Another typical one is that it tends to prefere frontend aggregation and looping of data instead of letting the database and backend deal with it.

Using mix of claude, cursor composer and codex.

eranation 8 hours ago||

LLMs diverge, not converge. They slightly increase entropy if not controlled. While you can have DRY skills and use AI to organize AI (in loops(tm) like Boris does) but eventually if you don’t understand the code, you are taking yourself out of the loop. And not just the job security that’s on the line, it’s the increasing cost for AI to babysit AI. If you or your “loops” (or paperclip, Hermes, gastown, or next in class agents of agents that runs your entire company) let it gradually sneak in slop-debt, the cost to fix it later will become prohibitive. (You can always just rewrite it, but as the race for “feature complete” and “zero backlog” continues, rewriting an ever growing set of new daily table stakes will become an economical moat)

TLDR: Keeping your codebase human readable and reason-about-able is not just helping humans to stay relevant. It will save costs for LLMs to maintain it.

rvz 8 hours ago|

> Before coding agents, when given a task, I would explore the codebase, think of different solutions, experiment, and only then implement. That could take days of consolidating all that context. When I finally submitted that PR, confidence was higher, and explaining each of my changes to my coworkers was easier.

Now we are getting to the point where we are speed-running the deskilling of engineers into comprehension debt and they themselves rapidly losing confidence in reviewing code they did not write.

I think this blog post [0] is the best example of what could go entirely wrong and even worse when you do not know the technology.

If you cannot explain a change even when "the CI is green" or "all tests passing", I will immediately reject it.

Maybe great for vibe coding prototypes, but it all changes when that code is deployed onto mission critical systems. Just ask Amazon with Kiro. [1]

[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...

[1] https://www.reuters.com/business/retail-consumer/amazons-clo...

More comments...