Posted by kiwieater 11 hours ago
So many people are just shouting ‘I wanna go fast’ and completely forgetting the lessons learned over the past few decades. Something is going to crash and burn, eventually.
I say this as a daily LLM user, albeit a user with a very skeptical view of anything the LLM puts in front of me.
I shipped a React Native app recently and probably 30% of the total dev time was wrapping every async call in try/catch with timeouts, handling permission denials gracefully, making sure corrupted AsyncStorage doesn't brick the app, and testing edge cases on old devices. None of that is the fun part. None of it shows up in a demo. But it's the difference between "works on my machine" and "works in production."
Vibecoding gets you to the demo. The gap is everything after that.
This is the exact kind of task that LLMs excel at
Edit: It's interesting how I am getting downvoted here when pangram confirms my suspicions that this is 100% AI generated.
Used Codex for the whole project. At first I used claude for the architect of the backend since thats where I usually work and got experience in. The code runner and API endpoints were easy to create for the first prototype. But then it got to the UI and here's where sh1t got real. The first UI was in react though I had specifically told it to use Vue. The code editor and output window were a mess in terms of height, there was too much space between the editor and the output window and no matter how much time I spent prompting it and explaining to it, it just never got it right. Got tired and opened figma, used it to refine it to what I wanted. Shared the code it generated to github, cloned the code locally then told codex to copy the design and finally it got it right.
Then came the hosting where I wanted the code runner endpoint to be in a docker container for security purpose since someone could execute malicious code that took over the server if I just hosted it without some protection and here it kept selecting out of date docker images. Had to manually guide it again on what I needed. Finally deployed and got it working especially with a domain name. Shared it with a few friends and they suggested some UI fixes which took some time.
For the runner security hardening I used Deepseek and claude to generate a list of code that I could run to show potential issues and despite codex showing all was fine, was able to uncover a number of issues then here is where it got weird, it started arguing with me despite showing all the issues present. So I compiled all the issues in one document, shared the dockerfile and linux secomp config tile with claude and the also issues document. It gave me a list of fixes for the docker file to help with security hardening which I shared back with codex and that's when it fixed them.
Currently most of the issues were resolved but the whole process took me a whole week and I am still not yet done, was working most evenings. So I agree that you cannot create a usable product used by lots of users in 30 minutes not unless it's some static website. It's too much work of constant testing and iteration.
It has basically eliminated surprises like that.
Something much closer to production SDLC patterns than a Figma mockup.
As we move from tailors to big box stores I think we have to get used to getting what we get, rather than feeling we can nitpick every single detail.
I'd also be more interested in how his 3rd, 4th or 5th vibe coded app goes.
The old rules still apply mainly.
The details and pitfalls that are unique to your specific scenario, that you only discover by running into them.
And yet this less obvious, more uncommon stuff is also what AI will be weakest at.
I’m working on project - a password manager, where I have full end to end test harnesses - cli client makes changes, sync them to the server and then observe the data in iOS app running in the emulator. More than once I noticed codex just hard coded expected values from the test harnesses directly into UI layout in iOS app to make the test pass…
Similar issues in the crypto layer - tests were written first , then code was written . During the review I noticed that the code was made to just pass the test - the logic was to check if signature values exists instead of checking if crypto signature is valid.
LLM can help with code reviews as well, but it has to be guided specifically what to look for for. This is with codex 5.4 model
I would find it a bit tricky to write a full test suite for a product without any code though. You'd need to understand the architecture a bit and likely end up assuming, or mocking, what helpers, classes, config, etc will be built.