Top
Best
New

Posted by samrolken 11/1/2025

Show HN: Why write code if the LLM can just do the thing? (web app experiment)(github.com)
I spent a few hours last weekend testing whether AI can replace code by executing directly. Built a contact manager where every HTTP request goes to an LLM with three tools: database (SQLite), webResponse (HTML/JSON/JS), and updateMemory (feedback). No routes, no controllers, no business logic. The AI designs schemas on first request, generates UIs from paths alone, and evolves based on natural language feedback. It works—forms submit, data persists, APIs return JSON—but it's catastrophically slow (30-60s per request), absurdly expensive ($0.05/request), and has zero UI consistency between requests. The capability exists; performance is the problem. When inference gets 10x faster, maybe the question shifts from "how do we generate better code?" to "why generate code at all?"
436 points | 324 commentspage 3
yanis_t 11/1/2025|
Robert Martin teaches us that codebase is behaviour and structure. While behaviour is something we want the software to do. The structure can be even more important because it defines how easy if possible to evolve the behaviour.

I'm not entirely sure why I had an urge to write this.

hyko 11/1/2025||
The fatal problem with LLM-as-runtime-club isn’t performance. It’s ops (especially security).

When the god rectangle fails, there is literally nobody on earth who can even diagnose the problem, let alone fix it. Reasoning about the system is effectively impossible. And the vulnerability of the system is almost limitless, since it’s possible to coax LLMs into approximations of anything you like: from an admin dashboard to a sentient potato.

“zero UI consistency” is probably the least of your worries, but object permanence is kind of fundamental to how humans perceive the world. Being able to maintain that illusion is table stakes.

Despite all that, it’s a fun experiment.

cheema33 11/1/2025||
> The fatal problem with LLM-as-runtime-club isn’t performance. It’s ops (especially security).

For me it is predictability. I am a big proponent of AI tools. But even the biggest proponents admit that LLMs are non-deterministic. When you ask a question, you are not entirely sure what kind of answers you will get.

This behavior is acceptable as a developer assistance tool, when a human is in the loop to review and the end goal is to write deterministic code.

hyko 11/1/2025||
Non-deterministic behaviour doesn’t help when trying to reason about the system. But you could in theory eliminate the non-determinism for a given input, and yet still be stuck with something unpredictable, in the sense that you can’t predict what new input will cause.

Whereas that sort of evaluation is trivial with code (even if at times program execution is non-deterministic), because its mechanics are explainable. Things like only testing boundary conditions hinge on this property, but completely fall apart if it’s all probabilistic.

Maybe explainable AI can help here, but to be honest I have no idea what the state of the art is for that.

finnborge 11/1/2025|||
At this extreme, I think we'd end up relying on backup snapshots. Faulty outcomes are not debugged. They, and the ecosystem that produced them, are just erased. The ecosystem is then returned to its previous state.

Kind of like saving a game before taking on a boss. If things go haywire, just reload. Or maybe like cooking? If something went catastrophically wrong, just throw it out and start from the beginning (with the same tools!)

And I think the only way to even halfway mitigate the vulnerability concern is to identify that this hypothetical system can only serve a single user. Exactly 1 intent. Totally partitioned/sharded/isolated.

hyko 11/1/2025||
Backup snapshots of what though? The defects aren’t being introduced through code changes, they are inherent in the model and its tooling. If you’re using general models, there’s very little you can do beyond prompt engineering (which won’t be able to fix all the bugs).

If you were using your own model you could maybe try to retrain/finetune the issues away given a new dataset and different techniques? But at that point you’re just transmuting a difficult problem into a damn near impossible one?

LLMs can be miraculous and inappropriate at the same time. They are not the terminal technology for all computation.

indigodaddy 11/1/2025||
What if they are extremely narrow and targeted LLMs running locally on the endpoint system itself (llamafile or whatever)? Would that make this concern at least a little better?
indigodaddy 11/1/2025||
Downvoted! What a dumb comment right?
qsort 11/1/2025||
If you're working like that then the prompt is the code and the LLM is the interpreter, and it's not obvious to me that it would be "better" than just running it normally, especially since an LLM with that level of capability could definitely help you with coding, no?

I think part of the issue is that most frameworks really suck. Web programming isn't that complicated at its core, the overengineering is mind boggling at times.

Thinking in the limit, if you have to define some type of logic unambiguously, would you want to do it in English?

Anyway, I'm just thinking out loud, it's pretty cool that this works at all, interesting project!

SamInTheShell 11/1/2025||
Currently today, I would say these models can be used by someone with minimal knowledge to churn out SPAs with React. They can probably get pretty far into making logins, message systems, and so on because there is lots of training data for those things. They can struggle through building desktop apps as well with relative ease compared to how I had to learn in years long past.

What these LLMs continue to prove those is they are no substitute for real domain knowledge. To date, I've yet to have a model implement RAFT consensus correctly in testing to see if they can build a database.

The way I interact with these models is almost adversarial in nature. I prompt them with the bare minimum that a developer might get in a feature request. I may even have a planning session to populate the context before I set it off on a task.

The bias in these LLMs really shines through an proves their autocomplete properties when they have a strong bias towards changing the one snippet of code I wrote because it doesn't fit in how it's training data would suggest the shape of it's code should be. Most models will course correct with instructions that they are wrong and I am right though.

One thing I've noted is that if you let it generate choices for you from the start of a project, it will make poor choices in nearly every language. You can be using uv to manage a python project and it will continue to try using pip or python commands. You can start an electron app and it will continuously botch if it's using commonjs or some other standard. It persistently wants to download go modules before coding instead of just writing the code and doing `go mod tidy` after (it literally doesn't need the module in advance, it doesn't even have tools to probe the module before writing the code anyway).

RAFT consensus is my go-to test because there is no 1 size fits all way for you to implement it. It might get an in-memory key store system right, but what if you want it to organize etcd/raft/v3 in a way that you can do multi-group RAFT? What if you need RAFT to coordinate some other form of data replication? None of these LLMs can really do it without a lot of prep work.

This is across all the models available from OpenAI, Claude, and Google.

attogram 11/1/2025||
"It works. That's annoying." Indeed!

Would be cooler if support for local llms was added. Currently only has support for anthropic and openai. https://github.com/samrolken/nokode/blob/main/src/config/ind...

mikebelanger 11/2/2025|
Yeah that'd be really something. If you could just pay the cost up-front, rather than worry about how much every newer request cost, that really changes the game. There's still many other issues to worry about, like security. But as the author points out, we might be much closer than we think.
koliber 11/2/2025||
Both the speed and cost problems can be solved by caching.

Each person gets their own cache. The format of the cache is a git repo tied to their sessionid. Each time a request is made it writes the code, html, CSS, and database to git and commits it. Over time you build more and more artifacts and fewer things need to get generated JIT. Should also help with stability.

indigoabstract 11/2/2025||
Interesting idea, it never crossed my mind, but maybe we can take it further?

Let's say, in the future, when AI learns how to build houses, every time I want to sleep, I'll just ask the AI to build a new house for me, so I can sleep. I guess it will have to repurpose the old one, but that isn't my concern, it's just some implementation detail.

Wouldn't that be nice?

Every night, new house?

indigodaddy 11/1/2025||
This is absolutely awesome. I had some ideas in my head that were very muddy and fuzzy re how to implement, eg like have the LLM just on demand/dynamically create/serve some 90s retro style html/website from a single entry field/form (to describe the website), etc, but I just couldn't begin to figure out how to go about it or where to start. But I love your idea about just putting the description in the route-- makes a lot of sense (I think I saw something else in the last few months on HN front page that was similar with putting whatever in a URI/domain route, but I think it was more of "redirect to whatever external website/page is most appropriate/relevant to the described route"- so a little similar but you've taken this to the next level).

I guess there are many of us out there with these same thoughts/ideas and you've done an awesome job articulating and implementing it, congrats!

brokensegue 11/1/2025|
Generating code will always be more performant and reliable than this. Just consider the security implications of this design...
samrolken 11/1/2025|
Exactly. It even includes built-in prompt injection as a "feedback form".
More comments...