Posted by nreece 1 day ago
I hear this so much. It's almost like people think code quality is unrelated to how well the product works. As though you can have 1 without the other.
If your code quality is bad, your product will be bad. It may be good enough for a demo right now, but that doesn't mean it really "works".
With that you could specify exactly what "good code" looks like and prevent the LLM from even committing stuff that doesn't match the rules.
Why? Modern hardware power allow for extremely inefficient code, so even if some code runs a thousand times slower because it's badly programmed it will still be so fast that it seems instant.
For the rest of the stuff, it has no relevance for the user of the software what the code is doing inside of the chip, as long as the inputs and outputs function as they should. User wants to give input and receive output, nothing else has any significance at all for her.
But that's just a small piece of the puzzle. I agree that the user only cares about what the product does and not how the product works, but the what is always related to how, even if that relationship is imperceptible to the user. A product with terrible code quality will have more frequent and longer outages (because debugging is harder), and it will take longer for new features to be added (because adding things is harder). The user will care about these things.
It can recognize patterns in the codebase it is looking at and extrapolate from that.
Which is why generated code is filled with comments most often seen in either tutorial level code or JavaScript (explaining the types of values).
Beyond that performance drops rapidly, and hallucinations go up inversely.
I've found it takes significant time to find the right "mode" of working with AI. It's a constant balance between maintaining a high-level overview (the 'engineering' part) while still getting that velocity boost from the AI (the 'coding' part).
The real trap I've seen (and fallen into) is letting the AI just generate code at me. The "engineering" skill now seems to be more about ruthless pruning and knowing exactly what to ask, rather than just knowing how to write the boilerplate.
The difference is what we used to call the "ilities": Reliability, inhabitability, understandability, maintainability, securability, scalability, etc.
None of these things are about the primary function of the code, i.e. "it seems to work." In coding, "it seems to work" is good enough. In software engineering, it isn't.
I’m sure it’ll improve over time but it won’t be nearly as easy as making ai good at coding.
A while ago I discovered that Claude, left to its own devices, has been doing the LLM equivalent of Ctrl-C/Ctrl-V for almost every component it's created in an ever growing .NET/React/Typescript side project for months on end.
It was legitimately baffling seeing the degree to which it had avoided reusing literally any shared code in favor of updating the exact same thing in 19 places every time a color needed to be tweaked or something. The craziest example was a pretty central dashboard view with navigation tabs in a sidebar where it had been maintaining two almost identical implementations just to display a slightly different tab structure for logged in vs logged out users.
I've now been directing it to de-spaghetti things when I spot good opportunities and added more best practices to CLAUDE.md (with mixed results) so things are gradually getting more manageable, but it really shook my confidence in its ability to architect, well, anything on its own without micromanagement.
My experience is that the tools are like a smart intern. They are great at undergraduate level college skills but they don't really understand how things should work in the real world. Human oversight and guidance by a skilled and experienced person is required to avoid the kinds of problems that you experienced. But holy cow this intern can write code fast!
Having extensive planning and conversation sessions with the tool before letting it actually write or change any code is key to getting good results out of it. It's also helpful to clarify my own understanding of things. Sometimes the result of the planning and conversing is that I manually make a small change and realize that the problem wasn't what I originally thought.
In some ways, this seems backwards. Once you have a demo that does the right thing, you have a spec, of sorts, for what's supposed to happen. Automated tooling that takes you from demo to production ready ought to be possible. That's a well-understood task. In restricted domains, such as CRUD apps, it might be automated without "AI".
Vibe-coded apps eventually fall over as they are overwhelmed by 101 bad architectural decisions stacked on top of one another. You need someone technical to make those decisions to avoid this fate.
"But AI can build this in 30min"
Also use MCPs like codex7 and Agentic LLMs for more interactivity instead of just relying on a raw model.
For example, you can pull the library code to your working environment and install the coding agent there as well. Then you can ask them to read specific files, or even all files in the library. I believe (according to my personal experience) this would significantly decrease the possibility of hallucinating.
Yes, they're bad now, but they'll get better in a year.
If the generative ability is good enough for small snippets of code, it's good enough for larger software that's better organized. Maybe the models don't have enough of the right kind of training data, or the agents don't have the right reasoning algorithms. But it is there.
If we’re simply measuring model benchmarks, I don’t know if they’re much better than a few years ago… but if we’re looking at how applicable the tools are, I would say we’re leaps and bounds beyond where we were.