AI can code, but it can't build software

Posted by nreece 1 day ago

AI can code, but it can't build software(bytesauna.com)

254 points | 166 commentspage 2

hamasho 1 day ago|

The problem with vibe coding is it demoralizes experienced software engineers. I'm developing a MVP with vibes and Claude Code and Codex output work in many cases for this relatively new project. But the quality of code is bad. There is already duplicated or unused logic, a lot of code is unnecessarily complex (especially React and JSX). And there's little PR reviews so that "we can keep velocity". I'm paying much less attention for quality now. After all, why bother when AI produce working code? I can't justify and don't have energy for deep-diving system design or dozens of nitpicking change requests. And it makes me more and more replaceable by LLM.

bloppe 1 day ago||

> I'm paying much less attention for quality now. After all, why bother when AI produce working code?

I hear this so much. It's almost like people think code quality is unrelated to how well the product works. As though you can have 1 without the other.

If your code quality is bad, your product will be bad. It may be good enough for a demo right now, but that doesn't mean it really "works".

krackers 1 day ago|||

Because there's a notion that if any bugs are discovered later on, they can just "be fixed". And generally unless you're the one fixing the bugs, it's hard to understand the asymmetry in effort here. No one also ever got any credit for bug-fixes compared to adding features.

theshrike79 22 hours ago||||

There is space for a generic tool that defines code quality as code. Something like ast-grep[0] or Roslyn analysers. Linters for some languages like Go do a lot of lifting in this field, but there could be more checks.

With that you could specify exactly what "good code" looks like and prevent the LLM from even committing stuff that doesn't match the rules.

[0] https://ast-grep.github.io

hamasho 1 day ago||||

I know how important code quality is. But I can't (or don't have energy to) convince junior engineers and sometimes project managers to submit good quality code instead of vibe-coded garbage anymore.

bloppe 1 day ago||

I just hope I never have to work at a company like that again

carlosjobim 1 day ago|||

> If your code quality is bad, your product will be bad.

Why? Modern hardware power allow for extremely inefficient code, so even if some code runs a thousand times slower because it's badly programmed it will still be so fast that it seems instant.

For the rest of the stuff, it has no relevance for the user of the software what the code is doing inside of the chip, as long as the inputs and outputs function as they should. User wants to give input and receive output, nothing else has any significance at all for her.

bloppe 1 day ago||

Sure. Everyone remembers from Algorithms 101 that a constant multiple ("a thousand times slower") is irrelevant. What matters is the scalability. Something that's O(n) will always scale better than something that O(n^2), even if the thing that's O(n) has 1000x overhead per unit.

But that's just a small piece of the puzzle. I agree that the user only cares about what the product does and not how the product works, but the what is always related to how, even if that relationship is imperceptible to the user. A product with terrible code quality will have more frequent and longer outages (because debugging is harder), and it will take longer for new features to be added (because adding things is harder). The user will care about these things.

phyzome 1 day ago||

I find it fascinating that your reaction to that situation is to double down while my reaction would be to kill it with fire.

OptionOfT 23 hours ago||

AI can produce code that looks like patterns which it has seen as part of its training data.

It can recognize patterns in the codebase it is looking at and extrapolate from that.

Which is why generated code is filled with comments most often seen in either tutorial level code or JavaScript (explaining the types of values).

Beyond that performance drops rapidly, and hallucinations go up inversely.

aurintex 1 day ago||

This is a great read and something I've been grappling with myself.

I've found it takes significant time to find the right "mode" of working with AI. It's a constant balance between maintaining a high-level overview (the 'engineering' part) while still getting that velocity boost from the AI (the 'coding' part).

The real trap I've seen (and fallen into) is letting the AI just generate code at me. The "engineering" skill now seems to be more about ruthless pruning and knowing exactly what to ask, rather than just knowing how to write the boilerplate.

dreamcompiler 1 day ago||

I've worked in a few teams where some member of the [human] team could be described as "Joe can code, but he can't build software."

The difference is what we used to call the "ilities": Reliability, inhabitability, understandability, maintainability, securability, scalability, etc.

None of these things are about the primary function of the code, i.e. "it seems to work." In coding, "it seems to work" is good enough. In software engineering, it isn't.

bradfa 1 day ago||

The context windows are still dramatically too small and the models aren’t yet seeming to train on how to build maintainable software. There is a lot less written down about how to do this on the public web. There’s a bunch of high level public writing but not may great examples of real world situations that happen on every proprietary software project, because that’s very messy data locked away internal to companies.

I’m sure it’ll improve over time but it won’t be nearly as easy as making ai good at coding.

ewoodrich 1 day ago||

> aren’t yet seeming to train on how to build maintainable software.

A while ago I discovered that Claude, left to its own devices, has been doing the LLM equivalent of Ctrl-C/Ctrl-V for almost every component it's created in an ever growing .NET/React/Typescript side project for months on end.

It was legitimately baffling seeing the degree to which it had avoided reusing literally any shared code in favor of updating the exact same thing in 19 places every time a color needed to be tweaked or something. The craziest example was a pretty central dashboard view with navigation tabs in a sidebar where it had been maintaining two almost identical implementations just to display a slightly different tab structure for logged in vs logged out users.

I've now been directing it to de-spaghetti things when I spot good opportunities and added more best practices to CLAUDE.md (with mixed results) so things are gradually getting more manageable, but it really shook my confidence in its ability to architect, well, anything on its own without micromanagement.

bradfa 7 hours ago||

I think this is a symptom of the limited size of context which the current tools can hold. As more and more data enters the context, the weighting of what's important or what already exists becomes "hard" for the AI tools to correctly deal with. Even to the point that information in any CLAUDE.md file is easily "forgotten" by the tool once the context gets quite deep.

My experience is that the tools are like a smart intern. They are great at undergraduate level college skills but they don't really understand how things should work in the real world. Human oversight and guidance by a skilled and experienced person is required to avoid the kinds of problems that you experienced. But holy cow this intern can write code fast!

Having extensive planning and conversation sessions with the tool before letting it actually write or change any code is key to getting good results out of it. It's also helpful to clarify my own understanding of things. Sometimes the result of the planning and conversing is that I manually make a small change and realize that the problem wasn't what I originally thought.

AnimalMuppet 1 day ago||

In fairness, there's a lot more "software" than there is "maintainable software" in their training data...

Animats 1 day ago||

OK, he makes a statement, and then just stops.

In some ways, this seems backwards. Once you have a demo that does the right thing, you have a spec, of sorts, for what's supposed to happen. Automated tooling that takes you from demo to production ready ought to be possible. That's a well-understood task. In restricted domains, such as CRUD apps, it might be automated without "AI".

sothatsit 1 day ago||

I like to think of it like AI can code, but it is terrible at making design decisions.

Vibe-coded apps eventually fall over as they are overwhelmed by 101 bad architectural decisions stacked on top of one another. You need someone technical to make those decisions to avoid this fate.

gherkinnn 1 day ago||

It is only a matter of years for all the idea guys in my org to realise this.

"But AI can build this in 30min"

thegrim33 1 day ago||

And here I am, using AI twice within the last 12 hours, to ask it two questions about an extremely well used, extremely well documented, physics library, and both times having it return to me sample code which makes use of library methods which don't exist. When I tell it this, I get the "Oh, you're so right to point that out!" response, and get new code returned, which still just blatantly doesn't work.

theshrike79 22 hours ago||

Someone had a blog post that said if a LLM hallucinates a method in your library, that means it should statistically have a method like that. LLMs work on probabilities and if the math says something should be there, who are you to argue =)

Also use MCPs like codex7 and Agentic LLMs for more interactivity instead of just relying on a raw model.

drcxd 1 day ago||

Hello, have you ever tried using the coding agents?

For example, you can pull the library code to your working environment and install the coding agent there as well. Then you can ask them to read specific files, or even all files in the library. I believe (according to my personal experience) this would significantly decrease the possibility of hallucinating.

preommr 1 day ago|

These discussions are so tiring.

Yes, they're bad now, but they'll get better in a year.

If the generative ability is good enough for small snippets of code, it's good enough for larger software that's better organized. Maybe the models don't have enough of the right kind of training data, or the agents don't have the right reasoning algorithms. But it is there.

phyzome 1 day ago||

I've been hearing "they'll be better in a few months/years" for a few years now.

Esophagus4 1 day ago||

But hasn’t the ecosystem as a whole been getting better? Maybe or maybe not on the models specifically, but ChatGPT came out and it could do some simple coding stuff. Then came Claude which could do some more coding stuff. Then Cursor and Cline, then reasoning models, then Claude Code, then MCPs, then agents, then…

If we’re simply measuring model benchmarks, I don’t know if they’re much better than a few years ago… but if we’re looking at how applicable the tools are, I would say we’re leaps and bounds beyond where we were.

CivBase 1 day ago|||

Problem is, as the author points out, designing software solutions is a lot more complicated than writing code. AI might get better in a year, but when will it be good enough? Does our current approach to AI even produce an economical solution to this problem, even if it's technically possible?

gitaarik 1 day ago||

So what's your point exactly? That LLMs cán write software, just not yet?

More comments...