When I reject AI code even if it works

Posted by vnbrs 13 hours ago

When I reject AI code even if it works(vinibrasil.com)

199 points | 130 commentspage 4

YongHaoHu 7 hours ago|

[dead]

codelong888 11 hours ago||

[flagged]

OffBeatDev 10 hours ago||

[dead]

panchtatvam 9 hours ago||

You must accept AI code only if you deem yourself dumber than AI.

cadamsdotcom 10 hours ago|

If you reject AI code that works then your mindset is still too hands on. Put another way - you still have some loops to work on taking yourself out of. The agent should’ve delivered code that was acceptable as a first pass.

Agents respond really well to feedback! They have no ego and they’ll happily improve code if told where and how. But you need to provide the tools that provide that feedback without your involvement - otherwise you can’t scale.

All the linting and autoformatting you can put in, is a good start. Next, create custom scripts that check for every single dumb AI-ism you can think of, tell the agent about them, tell it to use them to check its work, and put them in hooks so the harness refuses to let the agent stop until all your linters show no errors.

Then, keep iterating basically forever. Any dumb AI-ism you see, make a linter for it, give it to the agent, and enforce it using the harness.

I’ve spent months doing this. When I review a PR - which was built by the agent with TDD so it definitely works - I’m no longer asking if it did dumb stuff or confirming it conformed to the architecture or duplicated code or missed opportunities for reuse. That’s all linted for. I don’t worry about duplication or outdated docstrings/comments because the self review caught all that. I mostly read it to look for opportunities to make the feature even better & more useful.

If this makes no sense or you disagree it’s possible, my contact details are on my profile and I’ll be happy to give a demo.

royal__ 10 hours ago||

The problem I have with this kind of approach is 1) it emphasizes scaling up a much as possible, which I don't believe is necessarily the most valuable thing, and 2) I really don't want my job to be band aiding agent problems, because it's like herding cats and there will never be an end to it. I'd rather just...get hands on and be involved in the code I am working to create.

unknownfuture 9 hours ago||

Kinda fascinating watching a fairly reasonable response get downvoted. The AI psychosis really is catching...

Incidentally I also don't understand the drive to scale up. Show me a successful tech company and I'll show you a company that won, not by delivering code the fastest, but by delivering the right product with the right features at the right time.

Hell, Anthropic itself is the perfect example: they're doing well because unlike their competitors they realized the real revenues come from enterprise not consumer. They're winning by identifying the right market and giving them the right product.

equinumerous 10 hours ago|||

I am very curious what some of your lint rules look like in practice. In my mind a lot of the AI-isms in my code that I hate are stylistic or a matter of taste, not necessarily something I could write a deterministic rule to check. But I want to hear more. Like, what kind of linters did you create and which were highest impact?

cadamsdotcom 10 hours ago||

Start at https://github.com/cadamsdotcom/CodeLeash/blob/main/.pre-com... where you’ll see the custom lints.

Then have a look at https://github.com/cadamsdotcom/CodeLeash/blob/main/scripts/... (which was test-driven alongside https://github.com/cadamsdotcom/CodeLeash/blob/main/tests/un...)

The script can exit 2 to block the agent, and whatever it prints to stderr is shown to the agent. That’s a pretty darn flexible way to enforce whatever you like.

Despite this being in the codebase I still have no idea what python’s ast stuff is or does - I just let the agent rip, ensured it did TDD and reviewed it all to make sure the tests & code looked reasonable. I didn’t write this code and don’t want to. But I’ve watched it catch hundreds of dumb AI-isms, and watched the agent go “okay” and fix them ;) it’s been paying for itself over and over for months :)

unknownfuture 10 hours ago||

Frankly, if that's truly your flow, then you cannot possibly know if the code really does what you expect it to do.

"TDD" isn't some magic trick. The tests codify the expected behavior. But if you don't review them for correctness, if you let the LLM build them blindly, then you have no idea what those tests assert and can make no claims about whether the code then does what you expect.

That's fine. That's your choice.

But you have to acknowledge you've chosen to accept that you personally cannot vouch for the quality or correctness of that code.

I fully expect this to be the direction the industry goes, where increasingly complex systems exist that no human actually understands or can reason about.

I think it's bad for the industry. Very bad.

But I'm not making those decisions, so... it is what it is, I guess.

cadamsdotcom 10 hours ago|||

Huh?

I design everything with plan mode and review every line. Nothing happens to my codebase that I don’t decide should happen. With my way of working, tech debt doesn’t exist because I never have to create it.

You’ve made a bunch of assumptions you’re not conscious of. And now you’re blaming me for that.

Open your mind, you never know what you might (un)learn.

unknownfuture 9 hours ago||

So then your response has nothing to do with the post.

The thesis of the post is (paraphrasing): "if an AI wrote it, and I don't immediately grok it or if the code quality is low, I throw it away, even if on the surface it seems to work, because simply 'working' isn't enough to say a piece of code is acceptable."

I'd add as a corollary "and therefore I would never want to be accountable for that code."

If you're reviewing every line then it sounds like you have no argument with the writer and I don't understand what your point is.

Your very first paragraph says:

> If you reject AI code that works then your mindset is still too hands on. Put another way - you still have some loops to work on taking yourself out of.

But if you do indeed "review every line" then you seem pretty damn in the loop yourself and I don't understand what you think taking oneself out of the loop is.

cadamsdotcom 6 hours ago||

I’m in the loop after the work is done to a minimum standard.

The comment was motivated by the complaint that first-draft code from an agent can be brought up in quality significantly with a little bit of engineering.