The 100 hour gap between a vibecoded prototype and a working product

Posted by kiwieater 8 hours ago

The 100 hour gap between a vibecoded prototype and a working product(kanfa.macbudkowski.com)

193 points | 253 comments

alexpotato 6 hours ago|

I work as a DevOps/SRE and have been doing it FinTech (bank, hedge funds, startups) and Crypto (L1 chain) for almost 20 years.

My thoughts on vibe coding vs production code:

- vibe coding can 100% get you to a PoC/MVP probably 10x faster than pre LLMs

- This is partly b/c it is good at things I'm not good at (e.g. front end design)

- But then I need to go in and double check performance, correctness, information flow, security etc

- The LLM makes this easier but the improvement drops to about 2-3x b/c there is a lot of back and forth + me reading the code to confirm etc (yes, another LLM could do some of this but then that needs to get setup correctly etc)

- The back and forth part can be faster if e.g. you have scripts/programs that deterministically check outputs

- Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)

So overall, this is why I think we're getting wildly different reports on how effective vibe coding is. If you've never built a data pipeline and a LLM can spin one up in a few minutes, you think it's magic. But if you've spent years debugging complicated trading or compliance data pipelines you realize that the LLM is saving you some time but not 10x time.

matt_heimer 5 hours ago||

I'm building a Java HFT engine and the amount of things AI gets wrong is eye opening. If I didn't benchmark everything I'd end up with much less optimized solution.

Examples: AI really wants to use Project Panama (FFM) and while that can be significantly faster than traditional OO approaches it is almost never the best. And I'm not taking about using deprecated Unsafe calls, I'm talking about using primative arrays being better for Vector/SIMD operations on large sets of data. NIO being better than FFM + mmap for file reading.

You can use AI to build something that is sometimes better than what someone without domain specific knowledge would develop but the gap between that and the industry expected solution is much more than 100 hours.

jacquesm 4 hours ago|||

AI is extremely good at the things that it has many examples for. If what you are doing is novel then it is much less of a help, and it is far more likely to start hallucinating because 'I don't know' is not in the vocabulary of any AI.

Filligree 3 hours ago||

> because 'I don't know' is not in the vocabulary of any AI.

That is clearly false. I’m only familiar with Opus, but it quite regularly tells me that, and/or decides it needs to do research before answering.

If I instruct it to answer regardless, it generally turns out that it indeed didn’t know.

jacquesm 3 hours ago||

I haven't had that at all, not even a single time. What I have had is endless round trips with me saying 'no, that can't work' and the bot then turning around and explaining to me why it is obvious that it can't work... that's quite annoying.

dwaltrip 43 minutes ago||

Try something like:

> Please carefully review (whatever it is) and list out the parts that have the most risk and uncertainty. Also, for each major claim or assumption can you list a few questions that come to mind? Rank those questions and ambiguities as: minor, moderate, or critical.

> Afterwards, review the (plan / design / document / implementation) again thoroughly under this new light and present your analysis as well as your confidence about each aspect.

There's a million variations on patterns like this. It can work surprisingly well.

You can also inject 1-2 key insights to guide the process. E.g. "I don't think X is completely correct because of A and B. We need to look into that and also see how it affects the rest of (whatever you are working on)."

jacquesm 34 minutes ago||

Ok! I will try that, thank you very much.

dwaltrip 14 minutes ago||

Of course! I get pretty lazy so my follow-up is often usually something like:

"Ok let's look at these issues 1 at a time. Can you walk me through each one and help me think through how to address it"

And then it will usually give a few options for what to do for each one as well as a recommendation. The recommendation is often fairly decent, in which case I can just say "sounds good". Or maybe provide a small bit of color like: "sounds good but make sure to consider X".

Often we will have a side discussion about that particular issue until I'm satisfied. This happen more when I'm doing design / architectural / planning sessions with the AI. It can be as short or as long as it needs. And then we move on to the next one.

My main goal with these strategies is to help the AI get the relevant knowledge and expertise from my brain with as little effort as possible on my part. :D

A few other tactics:

- You can address multiple at once: "Item 3, 4, and 7 sound good, but lets work through the others together."

- Defer a discussion or issue until later: "Let's come back to item 2 or possibly save for that for a later session".

- Save the review notes / analysis / design sketch to a markdown doc to use in a future session. Or just as a reference to remember why something was done a certain way when I'm coming back to it. Can be useful to give to the AI for future related work as well.

- Send the content to a sub-agent for a detailed review and then discuss with the main agent.

mewpmewp2 14 minutes ago||||

I would say that if AI has to make decisions about picking between framework or constructs irrelevant to the domain at hand, it feels to me like you are not using the AI correctly.

mtrovo 5 hours ago||||

I think the main issue is treating LLM as a unrestrained black box, there's a reason nobody outside tech trust so blindly on LLMs.

The only way to make LLMs useful for now is to restrain their hallucinations as much as possible with evals, and these evals need to be very clear about what are the goal you're optimizing for.

See karpathy's work on the autoresearch agent and how it carry experiments, it might be useful for what you're doing.

riffraff 4 hours ago|||

> there's a reason nobody outside tech trust so blindly on LLMs.

Man, I wish this was true. I know a bunch of non tech people who just trusts random shit that chatgpt made up.

I had an architect tell me "ask chatgpt" when I asked her the difference between two industrial standard measures :)

We had politicians share LLM crap, researchers doing papers with hallucinated citations..

It's not just tech people.

withinboredom 2 hours ago|||

We were working on translations for Arabic and in the spec it said to use "Arabic numerals" for numbers. Our PM said that "according to ChatGPT that means we need to use Arabic script numbers, not Arabic numerals".

It took a lot of back-and-forths with her to convince her that the numbers she uses every day are "Arabic numerals". Even the author of the spec could barely convince her -- it took a meeting with the Arabic translators (several different ones) to finally do it. Think about that for a minute. People won't believe subject matter experts over an LLM.

We're cooked.

tstenner 52 minutes ago|||

The architect should have required Hindu numbers. Same result, but even more confusion.

dvfjsdhgfv 1 hour ago|||

Man this is maddening.

roncesvalles 1 hour ago|||

And the worst part is, these people don't even use the flagship thinking models, they use the default fast ones.

closewith 1 hour ago|||

In my experience, people outside of tech have nearly limitless faith in AI, to the point that when it clashes with traditional sources of truth, people start to question them rather than the LLM.

smokel 3 hours ago||||

> AI really wants to use Project Panama

It would help if you briefly specified the AI you are using here. There are wildly different results between using, say, an 8B open-weights LLM and Claude Opus 4.6.

matt_heimer 1 hour ago||

I've been using several. LM Studio and any of the open weight models that can fit my GPU's RAM (24GB) are not great in this area. The Claude models are slightly better but not worth they extra cost most of the time since I typically have to spend almost the same amount of time reworking and re-prompting, plus it's very easy to exhaust credits/tokens. I mostly bounce back and forth between the codex and Gemini models right now and this includes using pro models with high reasoning.

grim_io 5 hours ago||||

Wouldn't Java always lose in terms of latency against a similarly optimized native code in, let's say, C(++)?

matt_heimer 1 hour ago|||

You can achieve optimized C/C++ speeds, you just can't program the same way you always have. Step 1, switch your data layout from Array of Structures to Structure of Arrays. Step 2, after initial startup switch to (near) zero object creation. It's a very different way to program Java.

You have to optimize your memory usage patterns to fit in CPU cache as much as possible which is something typical Java develops don't consider. I have a background in assembly and C.

I'd say it's slightly harder since there is a little bit of abstraction but most of the time the JIT will produce code as good as C compilers. It's also an niche that often considers any application running on a general purpose CPU to be slow. If you want industry leading speed you start building custom FPGAs.

jacquesm 4 hours ago||||

Not necessarily. Java can be insanely performant, far more than I ever gave it credit for in the first decade of its existence. There has been a ton of optimization and you can now saturate your links even if you do fairly heavy processing. I'm still not a fan of the language but performance issues seem to be 'mostly solved'.

nly 4 hours ago||

"Saturating your links" is rarely the goal in HFT.

You want low deterministic latency with sharp tails.

If all you care about is throughput then deep pipelines + lots of threads will get you there at the cost of latency.

roncesvalles 1 hour ago||||

There are actually cases when Java (the HotSpot JVM) runs faster than the same logic written in C/C++ because the JVM is doing dynamic analysis and selective JIT compilation to machine code.

jodleif 4 hours ago||||

As long as you tune the JVM right it can be faster. But its a big if with the tune, and you need to write performant code

andriy_koval 1 hour ago||

Java has significant overhead, that most/every object is allocated on heap, synchronized and has extra overhead of memory and performance to be GC controlled. Its very hard/not possible to tune this part.

matt_heimer 1 hour ago||

You program differently for this niche in any language. The hot path (number crunching) thread doesn't share objects with gateway (IO) threads. Passing data between them is off heap, you avoid object creation after warm up. There is no synchronization, even volatile is something you avoid.

andriy_koval 1 hour ago||

> Passing data between them is off heap

how exactly you are passing data? You can pass some primitives without allocating them on heap. You can use some tiny subset of Java+standard library to write high performance code, but why would you do this instead of using Rust or C++?

tyingq 4 hours ago||||

Depends. Many reasons, but one is that Java has a much richer set of 3rd party libraries to do things versus rolling your own. And often (not always) third party libraries that have been extensively optimized, real world proven, etc.

Then things like the jit, by default, doing run time profiling and adaptation.

andriy_koval 1 hour ago||

Java has huge ecosystem in enterprise dev, but very unlikely it has ecosystem edge in high performance/real time compute.

not_kurt_godel 1 hour ago|||

I personally know of an HFT firm that used Java approximately a decade ago. My guess would be they're still using it today given Java performance has only improved since then.

andriy_koval 1 hour ago||

it doesn't mean Java is optimal or close to optimal choice. Amount of extra effort they do to achieve goals could be significant.

FpUser 5 hours ago||||

I am curious about what causes some to choose Java for HFT. From what I remember the amount of virgin sacrifices and dances with the wolves one must do to approach native speed in this particular area is just way too much of development time overhead.

matt_heimer 1 hour ago|||

Probably the same thing that makes most developers choice a language for a project, it's the language they know best.

It wasn't a matter of choosing Java for HFT, it was a matter of selecting a project the was a good fit for Java and my personal knowledge. I was a Java instructor for Sun for over a decade, I authored a chunk of their Java curriculum. I wrote many of the concurrency questions in the certification exams. It's in my wheelhouse :)

My C and assembly is rusty at this point so I believe I can hit my performance goals with Java sooner than if I developed in more bare metal languages.

nly 4 hours ago||||

"HFT" means different things to different people.

I've worked at places where ~5us was considered the fast path and tails were acceptable.

In my current role it's less than a microsecond packet in, packet out (excluding time to cross the bus to the NIC).

But arguably it's not true HFT today unless you're using FPGA or ASIC somewhere in your stack.

matt_heimer 55 minutes ago|||

Software HFT? I see people call Python code HFT sometimes so I understand what you mean. It's more in-line with low latency trading than today's true HFT.

I don't work for a firm so don't get to play with FPGAs. I'm also not co-located in an exchange and using microwave towers for networking. I might never even have access to kernel networking bypass hardware (still hopeful about this one). Hardware optimization in my case will likely top out at CPU isolation for the hot path thread and a hosting provider in close proximity to the exchanges.

The real goal is a combination of eliminating as much slippage as possible, making some lower timeframe strategies possible and also having best class back testing performance for parameter grid searching and strategy discovery. I expect to sit between industry leading firms and typical retail systematic traders.

atomicnumber3 4 hours ago|||

The one person who understands HFT yeah. "True" HFT is FPGA now and also those trades are basically dead because nobody has such stupid order execution anymore, either via getting better themselves or by using former HFTs (Virtu) new order execution services.

So yeah there's really no HFT anymore, it's just order execution, and some algo trades want more or less latency which merits varying levels of technical squeezing latency out of systems.

colechristensen 3 hours ago||||

Then you list all of the things you want it not to do and construct a prompt to audit the codebase for the presence of those things. LLMs are much better at reviewing code than writing it so getting what you want requires focusing more on feedback than creation instructions.

LtWorf 4 hours ago|||

I've seen SQL injection and leaked API tokens to all visitors of a website :)

Aurornis 4 hours ago|||

There’s a big gap between reality and the influencer posts about LLMs. I agree with you that LLMs do provide some significant acceleration, but the influencers have tried to exaggerate this into unbelievable numbers.

Even non-influencers are trying to exaggerate their LLM skills as a way to get hired or raise their status on LinkedIn. I rarely read the LinkedIn social feed but when I check mine it’s now filled with claims from people about going from idea to shipped product in N days (with a note at the bottom that they’re looking for a new job or available to consult with your company). Many of these posts come from people who were all in on crypto companies a few years ago.

The world really is changing but there’s a wave of influencers and trend followers trying to stake out their claims as leaders on this new frontier. They should be ignored if you want any realistic information.

I also think these exaggerated posts are causing a lot of people to miss out on the real progress that is happening. They see these obviously false exaggerations and think the opposite must be true, that LLMs don’t provide any benefit at all. This is creating a counter-wave of LLM deniers who think it’s just a fad that will be going away shortly. They’re diminishing in numbers but every LLM thread on HN attracts a few people who want to believe it’s all just temporary and we’re going back to the old ways in a couple years.

ryandrake 4 hours ago|||

> I rarely read the LinkedIn social feed but when I check mine it’s now filled with claims from people about going from idea to shipped product in N days (with a note at the bottom that they’re looking for a new job or available to consult with your company).

This always seems to be the pattern. "I vibe coded my product and shipped it in 96 hours!" OK, what's the product? Why haven't I heard of it? Why can't it replace the current software I'm using? So, you're looking for work? Why is nobody buying it?

Where is the Quicken replacement that was vibecoded and shipping today? Where are the vibecoded AAA games that are going to kill Fortnite? Where is the vibecoded Photoshop alternative? Heck, where is the vibecoded replacement for exim3 that I can deploy on my self hosted E-mail server? Where are all of the actual shipping vibecoded products that millions of users are using?

piersj225 2 hours ago|||

I found one example of this going very wrong on reddit the other day -

https://www.reddit.com/r/selfhosted/comments/1rckopd/huntarr...

One redditor security reviews a vibe coded project

ryandrake 31 minutes ago||

Wow, great example, and great example of what these fakers do when called out. Summary:

The maintainer, instead of listening to the security researcher and accepting feedback about his development process, instead:

1. Denied the problem

2. Censored discussion of the problem

3. Banned the people calling out the problem

...and then when the security issues were posted more publicly and got traction...

4. Made the subreddit private

5. Wiped and deleted his account

6. Wiped and deleted the GitHub repo

7. Took the project's web site off the web

Absolutely wild and unhinged behavior.

wierdbytes 4 hours ago||||

> Where are all of the actual shipping vibecoded products that millions of users are using?

Claude Code and OpenClaw - they are vibecoded. And I believe more coming.

Jensson 2 hours ago|||

Claude Code is not vibecoded, it is made using Claude Code but it is not vibecoded using Claude Code.

snovv_crash 3 hours ago|||

But it's like crypto then, good for buying other crypto, or illegal stuff.

Also people are using CC for the cheap access to the model, otherwise they'd be using opencode.

pjc50 2 hours ago||||

I regret only having one upvote for this.

I note that games are mostly art assets and things like level design, and players are already happy to instantly consign such products to the slop bin.

The whole thing is "market for lemons": app stores filling with dozens of indistinguishable clones of each product category will simply scare users off all of them.

youknownothing 4 hours ago||||

Yeah, I really wonder if someone would trust to do their taxes in a vibe-coded version of Turbotax...

mordechai9000 3 hours ago||

Do you really need Turbotax? Just feed it the tax code, your financial data, and the relevant forms and it should be good to go. Now we have freed up the labor of accountants so they can go be productive in another segment of society. /s

atomicnumber3 2 hours ago|||

"I come from a state that raises corn and cotton and cockleburs and Democrats, and frothy eloquence neither convinces nor satisfies me. I am from Missouri. You have got to show me."

roncesvalles 1 hour ago||||

>Many of these posts come from people who were all in on crypto companies a few years ago.

This is ditto my observation. There seems to be a certain "type" of people like this. And it's not just people looking for work.

My guess is either they have super low critical thinking, a very cynical view of the world where lies and exaggeration are the only way to make it, or something more pathological (narcissism etc).

ge96 4 hours ago||||

Day 7 of using Claude Code here are my takes...

sebastiennight 4 minutes ago||

“Day 7" would be amazing - all that I see YouTube recommending is "I tried it for 24 hours"

I was listening to an "expert" on a podcast earlier today up until the point where the interviewer asked how long his amazing new vibe-coded tooling has been in production, and the self-proclaimed expert replied "actually we have an all-hands meeting later today so I can brief the team and we will then start using the output..."

paganel 4 hours ago|||

The “store on the chain” thing turned out to be a fad in terms of technology, even though it made a lot of money (in the billions and more) to some people via the crypto thing. That was less than 10 years ago, so many of us do remember the similarities of the discourse being made then to what’s happening now.

With all that said, today’s LLMs do seem so provide a little bit more value compared to the bit chain thing, for example OCR/.pdf parsing is I’d say a solved thing right now thanks to LLMs, which is nice.

bittermandel 1 hour ago|||

This is exactly my experience at Lovable. For some parts of the organization, LLMs are incredibly powerful and a productivity multiplier. For the team I am in, Infra, it's many times distraction and a negative multiplier.

I can't say how many times the LLM-proposed solution to a jittery behavior is adding retries. At this point we have to be even more careful with controlling the implementation of things in the hot path.

I have to say though, giving Amp/Claude Code the Grafana MCP + read-only kubectl has saved me days worth of debugging. So there's definitely trade-offs!

bee326 1 hour ago||

My colleague recently shipped a "bug fix" that addresses a race condition by adding a 200ms delay somewhere, almost completely coded by LLM. LLM even suggests that "if this is not good enough, increase it to 300ms".

That says something about how much some people care about this.

nomorewords 20 minutes ago||

Even doubly so because that's how most people have solved a similar problem, so that the LLM suggests that

the__alchemist 1 hour ago|||

I concur on the DevSecOps aspect for a more specific reason: If you're failing a pipeline because ThirdPartyTOol69 doesn't like your code style or W/E, you can have the LLM fix it. Or get you to 100% test coverage etc. Or have it update your Cypress/Jest/SonarQube configs until the pipeline passes without losing brain cells doing it by hand. Or finds you a set of dependency versions that passes.

Aperocky 5 hours ago|||

The magic is testing. Having locally available testing and high throughput testing with high amount of test cases now unlocks more speed.

The test cases themselves becomes the foci - the LLM usually can't get them right.

robhlt 3 hours ago|||

How does that test suite get built and validated? A comprehensive and high quality test suite is usually much larger than the codebase it tests. For example, the sqlite test suite is 590x [1] the size of the library itself

1. https://sqlite.org/testing.html

nswango 2 hours ago||

sqlite is an extreme outlier not a typical example, with regard to test suite size and coverage.

worik 57 minutes ago||||

> The magic is testing.

No it is not.

There os no amount of testing that can fix a flawed design

neonbrain 4 hours ago|||

The word "Testing" is a very loaded term. Few non-professionals, or even many professionals, fully understand what is meant by it.

Consider the the following: Unit, Integration, System, UAT, Smoke, Sanity, Regression, API Testing, Performance, Load, Stress, Soak, Scalability, Reliability, Recovery, Volume Testing, White Box Testing, Mutation Testing, SAST, Code Coverage, Control Flow, Penetration Testing, Vulnerability Scanning, DAST, Compliance (GDPR/HIPAA), Usability, Accessibility (a11y), Localization (L10n), Internationalization (i18n), A/B Testing, Chaos Engineering, Fault Injection, Disaster Recovery, Negative Testing, Fuzzing, Monkey Testing, Ad-hoc, Guerilla Testing, Error Guessing, Snapshot Testing, Pixel-Perfect Testing, Compatibility Testing, Canary Testing, Installation Testing, Alpha/Beta Testing...

...and I'm certain I've missed dozens of other test approaches.

worik 55 minutes ago|||

There is no science to testing, no provable best way, despite many people's vehement opinions

megous 3 hours ago|||

You forgot a hope-driven development and release process and other optimism based ("i'm sure it's fine" method), or faith based approaches to testing (ship and pray, ...). Customer driven invluntary beta testing also comes to mind and "let's see what happens" 0-day testing before deployment. We also do user-driven error discovery, frequently.

yojo 4 hours ago|||

> - This is partly b/c it is good at things I'm not good at (e.g. front end design)

Everyone thinks LLMs are good at the things they are bad at. In many cases they are still just giving “plausible” code that you don’t have the experience to accurately judge.

I have a lot of frontend app dev experience. Even modern tools (Claude w/Opus 4.6 and a decent Claude.md) will slip in unmaintainable slop in frontend changes. I catch cases multiple times a day in code review.

Not contradicting your broader point. Indeed, I think if you’ve spent years working on any topic, you quickly realize Claude needs human guidance for production quality code in that domain.

steveBK123 2 hours ago|||

Yes I’ve seen this at work where people are promoting the usage of LLMs for.. stuff other people do.

There’s also a big disconnect in terms of SDLC/workflow in some places. If we take at face value that writing code is now 10x faster, what about the other parts of the SDLC? Is your testing/PR process ready for 10x the velocity or is it going to fall apart?

What % of your SDLC was actually writing code? Maybe time to market is now ~18% faster because coding was previously 20% of the duration.

onionisafruit 3 hours ago|||

It’s the Gell-Mann amnesia effect applied to LLM instead of media

bojangleslover 4 hours ago|||

What I do now is I make an MVP with the AI, get it working. And then tear it all down and start over again, but go a little slower. Maybe tear down again and then go even more slowly. Until I get to the point where I'm looking at everything the AI does and every line of code goes through me.

bauerd 5 hours ago|||

>Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)

Absolutely. Tight feedback loops are essential to coding agents and you can’t run pipelines locally.

amelius 3 hours ago|||

Also, now you're reading someone else's code and not everybody likes that. In fact, most self-proclaimed 10x coders I know hate it.

So instead of the 10x coder doing it, the 1x coder does it, but then that factor of 3x becomes 0.3x.

steveBK123 2 hours ago||

Absolutely. In my experience there are more “good coders” than people who are good at code review/PR/iterative feedback with another dev.

A lot of people are OCD pedants about stuff that can be solved with a linter (but can’t be bothered to implement one) or just “LGTM” everything. Neither provide value or feedback to help develop other devs.

baxtr 4 hours ago|||

Isn’t that the reason why people advocate for spec-driven development instead of vibe coding?

netbioserror 2 hours ago|||

More generally: LLM effectiveness is inversely proportional to domain specificity. They are very good at producing the average, but completely stumble at the tails. Highly particular brownfield optimization falls into the tails.

quater321 4 hours ago||

At this point, every programmer who claims that vibecoding doesn't make you at least 10 times more productive is simply lying or worst, doesn't know how to vibe code. -So, you want to tell me that you don't review the code you write? Or that others don't review it? - You bring up ONE example with a bottleneck that has nothing to do with programming. Again, if you claim it doesn't make you 10x more productive, you don't know how to use AI, it is that simple. - I pin up 10 agents, while 5 are working on apps, 5 do reviews and testing, I am at the end of that workflow and review the code WHILE the 10 agents keep working.

For me it is far more than 10x, but I consider noobs by saying 10x instead of 20x or more.

atomicnumber3 2 hours ago|||

Can you link to one launched product with users for us?

sieste 2 hours ago||||

Just goes to show that most programmers have no idea what most programmers are mostly programming. Great that it works for you, but don't assume that this applies to everyone else.

geetee 4 hours ago||||

I can't tell if this is real or a joke.

hsuduebc2 3 hours ago|||

What exactly are you producing? LinkedIn posts?

mrothroc 2 hours ago||

Everyone keeps saying 80/20 but that undersells what's going on. The last 20% isn't just hard. It's hard because of what happened during the first 80%.

When an agent takes a shortcut early on, the next step doesn't know it was a shortcut. It just builds on whatever it was handed. And then the step after that does the same thing. So by hour 80 you're sitting there trying to fix what looks like a UI bug and you realize the actual problem is three layers back. You're not doing the "hard 20%." You're paying interest on shortcuts you didn't even know were taken. (As I type this I'm having flashbacks to helping my kid build lego sets.)

The author figured this out by accident. He stopped prompting and opened Figma to design what he actually wanted. That's the move. He broke the chain before the next stage could build on it. The 100 hours is what it costs when you don't do that.

redgridtactical 2 hours ago||

The 100 hours number feels about right for a solo project. What people underestimate is that the last 20% isn't just polish — it's the boring defensive stuff that makes an app not crash on someone else's phone.

I shipped a React Native app recently and probably 30% of the total dev time was wrapping every async call in try/catch with timeouts, handling permission denials gracefully, making sure corrupted AsyncStorage doesn't brick the app, and testing edge cases on old devices. None of that is the fun part. None of it shows up in a demo. But it's the difference between "works on my machine" and "works in production."

Vibecoding gets you to the demo. The gap is everything after that.

shepherdjerred 2 hours ago||

> probably 30% of the total dev time was wrapping every async call in try/catch with timeouts, handling permission denials gracefully, making sure corrupted AsyncStorage doesn't brick the app

This is the exact kind of task that LLMs excel at

croisillon 2 hours ago|||

c'm'on, drop that

johnfn 26 minutes ago||

This comment is written by an LLM, right?

Edit: It's interesting how I am getting downvoted here when pangram confirms my suspicions that this is 100% AI generated.

phillipclapham 4 hours ago||

The gap is definitely real. But I think most of this thread is misdiagnosing why it exists. It's not that AI cannot produce production quality code, it's that the very mental model most people have of AI is leading them to use the wrong interaction model for closing that last 20% of complexity in production code bases.

The author accidentally proved it: the moment they stopped prompting and opened Figma to actually design what they wanted, Claude nailed the implementation. The bottleneck was NEVER the code generation, it was the thinking that had to happen BEFORE ever generating that code. It sounds like most of you offload the thinking to AFTER the complexity has arisen when the real pattern is frontloading the architectural thinking BEFORE a single line of code is generated.

Most of the 100-hour gap is architecture and design work that was always going to take time. AI is never going to eliminate that work if you want production grade software. But when harnessed correctly it can make you dramatically faster at the thinking itself, you just have to actually use it as a thinking partner and not just a code monkey.

AstroBen 4 hours ago||

I don't know how other people work, but writing the code for me has been essential in even understanding the problem space. The architecture and design work in a lot of cases is harder without going through that process.

suzzer99 54 minutes ago|||

I recently had to build a widget that lets the user pick from a list of canned reports and then preview them in an overlay before sending to the printer (or save to PDF). All I knew was that I wanted each individual report's logic and display to be in its own file, so if the system needed to grow to 100 reports, it wouldn't get any more complicated than with 6 reports.

The final solution ended up being something like: 1. Page includes new React report widget. 2. Widget imports generic overlay component and all canned reports, and lets user pick a report. 3. User picks report, widget sets that specific report component as a child of the overlay component, launches overlay. 4. Report component makes call to database with filters and business logic, passes generic set of inputs (report title, other specifics, report data) to a shared report display template.

My original plan was for the report display template to also be unique to each report file. But when the dust settled, they were so similar that it made sense to use a shared component. If a future report diverges significantly, we can just skip the shared component and create a one-off in the file.

I could have designed all this ahead of time, as I would need to do with an LLM. But it was 10x easier to just start coding it while keeping my ultimate scalability goals in mind.

apitman 3 hours ago||||

See "Programming as Theory Building": https://pages.cs.wisc.edu/~remzi/Naur.pdf

phillipclapham 3 hours ago||||

That's a good point and honestly I occasionally do the same thing. Sometimes you have to build something wrong to understand what right looks like. I think the distinction is between exploratory prototyping (building to learn/think) and expecting the prototype to BE the product. The first is thinking, the second is where the 100-hour gap bites you in the ass.

dijksterhuis 19 minutes ago||||

- version 1 -- we build what we think is needed

- version 2 -- we realise we're solving a completely different problem to what is needed

- version 3 -- we build what is actually needed

seanmcdirmid 3 hours ago|||

This. It’s also much easier to tell someone what you don’t like if what you don’t like is right in front of you than to tell them what you want without a point of reference.

cyk21 47 minutes ago|||

This.

Additionally, the author seems to build an app just for the sake of building an app / learning, not to solve any real serious business problem. Another "big" claim on LLM capabilities based on a solo toy project.

Gud 3 hours ago|||

Absolutely. You need to treat it like a real program from the very beginning.

jopsen 4 hours ago|||

Yeah, communicating what you want can be hard.

I'm doing a simple single line text editor, and designing some frame options. Which has a start end markers.

This was really hard to get the LLM to do right.. until just took a pen and paper, drew what I wanted, took a photo and gave it to the llm

tqwhite 3 hours ago||

YES YES YES!! I so wish that we could go back in time and never, ever have even suggested anything other that what you say here. AI doesn't do it for you. It does it with you.

You have to figure out what you want before the AI codes. The thinking BEFORE is the entire game.

Though I will also say that I use Claude for working out designs a lot. Literally hours sometimes with long periods of me thinking it through.

And I still get a ton more done and often use tech that I would never have approached before these glory days.

phillipclapham 3 hours ago||

The hours of design thinking with Claude is exactly it. That's the part nobody talks about because it isn't 'sexy' and doesn't make for a good demo or tweet. But it's the secret sauce IMO.

raincole 5 hours ago||

They're... launching an NFT product in 2026...

I know it's not the point of this article, but really?

itomato 1 hour ago||

And the viewpoint is from the development of such "product" with "manufactured virality".

It's bunk.

s1mon 5 hours ago|||

Yep. As much as the rest of it resonated with LLM coding experiences I'm having, the NFT thing is unfortunate.

suzzer99 44 minutes ago|||

I'd pay a few bucks for some cool avatar or w/e this is. It seems like a good use of NFTs.

serial_dev 5 hours ago|||

The way I see it, the NFT part is actually just for convenience to distribute AI generated images.

It could have been a web app, but with NFTs and Farcaster miniapps, you market to people who are willing and able to spend using their wallet instead of asking “normies” for credit card information for a 2 dollar custom image (that you could also prompt out of a free Gemini session).

With Farcaster, you also already have the profile picture of the user, one less hurdle again.

ryandrake 4 hours ago||

I think there's simply a huge overlap between the Crypto Bros, the NFT Bros, and now the AI Bros. The same sorts of people are pumping each one. I knew a guy who was into LeadGen and Drop Shipping in the 2000s, then got into online poker, then of course, got into Crypto, then inevitably NFTs. I haven't kept up with him, but I'm almost 100% sure he's pumping some AI related scheme now. These guys get into this pipeline and at each stage they are convinced that they're going to get rich off it.

colechristensen 3 hours ago||

Crypto has very narrow usage unless you're a criminal or a bro, NFT has essentially 0 non-bro activity, surely AI attracts bros, but also some of the smartest people I've known have been working on it a long time to build truly useful things.

AI can be really attractive to bros but also be incredibly useful.

In other words, AI isn't a trend that's going to pass, it's permanently going to reshape the tech scene and economy in a way that cryptocoins and NFTs absolutely did not.

ryandrake 3 hours ago||

> AI isn't a trend that's going to pass, it's permanently going to reshape the tech scene and economy in a way that cryptocoins and NFTs absolutely did not.

This exact wording was used for crypto. "It isn't a trend that's going to pass" and "It's going to reshape everything." Why are we sure of it now for AI (and that we're going to be right), when they were also sure of it before for crypto (and they ended up wrong)?

The AI people have the exact same feelings of absolute certainty as the crypto people had.

cheschire 2 hours ago||

People's grandmothers know what AI is and many have used it, even outside the west.

Probably zero grandmothers outside the west, and very few grandmothers within the west, know what NFT even stands for.

chamomeal 2 hours ago||

I have friends (well, friends of friends) who still play the NFT lottery. People love gambling lol

daveguy 1 hour ago||

I thought everyone realized by now that a digital image made available via block chain or any other mechanism, can be duplicated indefinitely. The only thing you get is a copyright on some generated image or set of bits. And what are the chances any random digital image is going to be appreciated as art? You can't hang it in a living room or sit it on a coffee table. It's beanie babies, but without even a hill of beans.

Are people just expecting there's going to be enough digital fools to make a market?

sowbug 4 minutes ago|||

Isn't the same true of any intellectual property?

A movie can be duplicated indefinitely. There's no guarantee your song will be appreciated as art. I'm not sure why you say you can't print out an image and hang it in your living room; we do that all the time at home.

I've personally never dabbled in NFTs, but I don't think it's fair to ascribe the inherent conflict between information and scarcity uniquely to them.

roncesvalles 1 hour ago|||

You don't have to believe in it. You just have to believe someone else will believe in it and be willing to pay a higher price.

marginalia_nu 4 hours ago||

The more I evaluate Claude Code, the more it feels like the world's most inconsistent golfer. It can get within a few paces of the hole in often a single strike, and then it'll spend hours, days, weeks trying to nail the putt.

There's some 80-20:ness to all programming, but with current state of the art coding models, the distribution is the most extreme it's ever been.

ChrisMarshallNY 4 hours ago||

"working" != "shipping."

When we start selling the software, and asking people to pay for/depend upon our product, the rules change -substantially.

Whenever we take a class or see a demo, they always use carefully curated examples, to make whatever they are teaching, seem absurdly simple. That's what you are seeing, when folks demonstrate how "easy" some new tech is.

A couple of days ago, I visited a friend's office. He runs an Internet Tech company, that builds sites, does SEO, does hosting, provides miscellaneous tech services, etc.

He was going absolutely nuts with OpenClaw. He was demonstrating basically rewiring his entire company, with it. He was really excited.

On my way out, I quietly dropped by the desk of his #2; a competent, sober young lady that I respect a lot, and whispered "Make sure you back things up."

youknownothing 4 hours ago||

I'm having somewhat good experiences with AI but I think that's because I'm only half-adopting it: instead of the full agentic / Ralphing / the-AI-can-do-anything way, I still do work in very small increments and review each commit. I'm not as fast as others, but I can catch issues earlier. I also can see when code is becoming a mess and stop to fix things. I mean, I don't fix them manually, I point Claude at the messy code and ask it to refactor it appropriately, but I do keep an eye to make sure Claude doesn't stray off course.

Honestly, seeing all the dumb code that it produces, calling this thing "intelligent" is rather generous...

tqwhite 3 hours ago|

I would love it if someone explained what their ten agents Ralphing away were actually told to do.

I suppose if you are doing something that truly can be decided based on a test but, I just don't see it, at least for anything I do.

apitman 3 hours ago||

I think ralphing is for purely vibe coded stuff, where you're literally never looking at the code and only asking for changes to the final output.

If I'm reviewing all the code, so far I'm still the bottleneck even with a single agent and I don't see an easy way to change that.

niemandhier 6 hours ago||

With sufficiently advanced vibe coding the need for certain type of product just vanishes.

I needed it, I quickly build it myself for myself, and for myself only.

sieste 5 hours ago||

Related anecdote: My 12yo son didn't like the speed cubing online timer he was using because it kept crashing the browser and interrupted him with ads. Instead of googling a better alternative we sat down with claude code and put together the version of the website that behaved and looked exactly as he wanted. He got it working all by himself in under an hour with less than 10 prompts, I only helped a bit putting it online with github pages so he can use it from anywhere.

WarmWash 5 hours ago|||

I don't think people are grasping yet that this is the future of software, if by no metric other than "most software used is created by the user".

AstroBen 4 hours ago|||

The average user doesn't even know what a file is

sieste 3 hours ago||

Turns out that knowing what a plain text file is will be the criterion that distinguishes users who are digitally free from those locked into proprietary platforms.

nly 4 hours ago||||

Wont happen.

The average user just has no interest in building things.

sieste 2 hours ago|||

Many parents are extremely interested in quickly building digital tools for their kids (education and entertainment) that they know are free from advertising, social media integration, user monitoring etc.

GeoAtreides 2 hours ago||

I'm saying this with all my love and respect: you are living in a very small bubble

sieste 2 hours ago||

That may be true. But you also have to give the average parent more credit by assuming they don't want tech companies spying on their children and forcing their toxic platforms on them.

There are well attended parent evenings in our school on that topic.

Thinking about it, we should turn these into vibe coding hackathons where we replace all the ad-ridden little games, learning tools, messengers we don't like with healthy alternatives.

WarmWash 4 hours ago|||

Which is why they will use AI to do the building...

marcosdumay 4 hours ago||||

So... The future is like the past?

That would be good news, but I doubt most people will do things like that.

qsera 4 hours ago||||

>most software used is created by the user

You really believe that?

WarmWash 4 hours ago|||

Yes, because the current software paradigm (a shed/barn/warehouse full of tools to suite every possible users every possible need) doesn't make sense when LLMs can turn plain English into a software tool in the matter of minutes.

qsera 2 hours ago||

>LLMs can turn plain English into a software tool in the matter of minutes.

Unless LLMs can read minds, no one will bother to specify, even in plain english with the required level of detail. And that is assuming the user has the details in mind, which is also something pretty improbable...

zahlman 4 hours ago|||

That wasn't being claimed, just proposed as the direction we're headed.

qsera 4 hours ago||

Another user had already written what I had in mind when I responded to your comment..

https://news.ycombinator.com/item?id=47387570

SlinkyOnStairs 2 hours ago|||

> I don't think people are grasping yet that this is the future of software

What about this is new?

Sitting down with a child to teach them the very basics of javascript in an hour? Trivial.

Needing Claude to do it is kind of embarassing, if anything.

kaffekaka 1 hour ago||||

Out of curiosity, did you also implement scramble support? Or just the timing stuff?

sieste 26 minutes ago||

yes. claude added a suggested random scramble (if that's what you mean?), also running average of 5/12/100, local storage of past times on first iteration, my son told it to also add a button for +2s penalties and touch screen support.

zahlman 4 hours ago|||

... So at no point in this did anyone even question why it should be a website?

AstroBen 4 hours ago|||

Because now that website is fully cross-platform and sandboxed with no practical downside

sieste 3 hours ago|||

"use it from anywhere" was important, and I don't think there's an easier way than a freely hosted static website.

lacedeconstruct 6 hours ago|||

I dont want that though, I want someone to spend much more time than I can afford thinking about and perfecting a product that I can pay for and dont worry about it

jsdalton 5 hours ago|||

The metaphor that’s popped into my head recently is baking bread.

You can learn to bake good bread. It’s not _that_ hard. And it’ll probably taste better than store bought bread.

But it almost certainly won’t be cheaper. And it’ll take a more more time and effort.

Still, sometimes you might bake your own bread for kicks. But most of the time, you’ll just buy the bread someone else has already perfected.

nly 4 hours ago||

Baking bread also takes hours of waiting.

I can have fresh bread anytime I want from a handful of nearby stores.

kami23 5 hours ago||||

And some people do, both things can be true. I'd rather make a tool just for me that breaks when I introduce a new requirement and I just add into it and keep going.

kjksf 4 hours ago||

The statement wasn't: "no one ever vibe codes an alternative to product X"

It was: "With sufficiently advanced vibe coding the need for certain type of product just vanishes."

If a product has 100 thousand users and 1% of them vibe codes an alternative for themselves, the product / business doesn't vanish. They still have 99 thousand of users.

That was the rebuttal, even if not presented as persuasively and intelligently as I just did.

So no, it's not the case of "both things being true". It's a case of: he was wrong.

GorbachevyChase 1 hour ago||

At some point there will be market consequences for that kind of behavior. So where market dynamics are not dominated by bullshit (politics, friendships forged on Little St James, state intervention, cartel behavior, etc.) if my company provides the same service as another, but I replaced all of the low quality software as a service products my competitor uses with low quality vibe coded products, my overhead cost will be lower and that will give me an advantage.

hmmmmmmmmmmmmmm 6 hours ago|||

If we could return to one-off payments without dark patterns I would agree. Hopefully at least the software that rely on grift will start to vanish.

keyle 6 hours ago|||

I built a jira with attachments and all sorts of bells and whistles. Purrs like a kitten. Saas are going extinct. At least the jobs that charged $1000 a day to write jira plugins.

ivan_gammel 6 hours ago|||

Some minor UX enhancement SaaS of the most recent VC-funded wave will do. Maybe those who forgot how to invest in R&D and spent last 20 years just fixing bugs. There’s plenty of SaaS on the market that offers added value beyond the code. Data brokers. Domain experts, etc. Even if homemade solution is sometimes possible, initial development costs are going to be just one of several important factors in choosing whether to build or to buy.

101008 5 hours ago||||

SaaS are not going exctinct. This reminds me of the LinkedIn posts saying they clone Slack in two hours, copying the UI, etc. Yeah, if you think Slack is private chat rooms then you should use IRC for your company.

One of the most valuable things about Slack is the ecosystem: apps, API support, etc. If you need to receive notifications from external apps (like PageDuty or Incident.io or something like that), good luck expecting them having a setup for your own version of the app. Yeah, some of them provide webhooks (not all of them), but in the end you have to maintain that too...

advancespace 4 hours ago||

[dead]

pydry 5 hours ago|||

jira is a perfect example of an abysmal product that was marketed well.

xp84 5 hours ago||

Yes, it seems like it got to some tipping point around 2013 where so many product and management people were familiar with it, and from there it became this “industry standard” that management always wanted everyone to use.

Also though, I feel like being attached to Confluence helped it because there is a lot less competition in the world of documentation wikis than there is in task management.

matwood 2 hours ago|||

Products where the only value was the code are definitely under pressure. But, how many products are really like that? I suggest everyone look up HALO that’s so popular in investing right now, and start looking at companies with the assumption that the value of the code is zero so what other value is there. There’s often a lot more there than people realize.

jcgrillo 5 hours ago|||

How many products are actually like that? If I could easily replace github, datadog/sentry/whatever, cloudflare, aws, tailscale that would be great. In my view building and owning is better than buying or renting. Especially when it comes to data--it would be much better for me to own my telemetry data for example than to ship it off to another company. But I don't think you (or anyone) will be vibecoding replacements for these services anytime soon. They solve big, hard, difficult problems.

CuriouslyC 5 hours ago|||

Github is on the chopping block as a tool (it's sticky as a social network). The other stuff not so much.

The things that are going away are tools that provide convenience on top of a workflow that's commoditized. Anything where the commercial offering provides convenience rather than capabilities over the open source offerings is gonna get toasted.

jcgrillo 5 hours ago||

Even at recent levels of uptime I think it would be very difficult to build a competing product that could function at the scale of even a small company (10 engineers). How would you implement Actions? Code review comments/history? Pull requests? Issues? Permalinks? All of these things have serious operational requirements. If you just want some place to store a git repository any filesystem you like will do it but when you start talking about replacing github that's a different story altogether and TBH I don't think building something that appears to function the same is even the hard part, it's the scaling challenges you run into very quickly.

WarmWash 4 hours ago||

The future is narrow bespoke apps custom tailored for exactly that one single users use case.

An example would be if the user only ever works with .jpg files, then you don't need to support any of the dozens of other formats an image program would support.

I cannot stress enough how many software users out there are only using 1-10% of a program's capability, yet they have to pay for a team of devs who maintain 100% of it.

jcgrillo 4 hours ago||

"The future" is fiction. It's a blank canvas where you can make a fingerpainting of any fantasy you like. Whenever people tell me about "the future" I know they're talking absolute rubbish. And I also like your fantasy! But it probably won't happen.

ryandrake 4 hours ago||

I call it "Psychics for Programmers." People will scoff at psychics and fortune telling and palm reading, but then the same people will listen to Elon or some founder or VC and be utterly convinced that that person is a visionary and can describe the future.

WarmWash 4 hours ago||

It's just reading the room. People hate having to use their computers through the lens of quasi-robot humans (saying that as one of those robots). They hate having to pay monthly just so dumb features and UI overhauls can be pushed on them.

They just want the software to do the few things they need it to do. AI labs are falling over themselves to remove the gate keeping regular people from using their computing device the way they want to use it. And the progress there in the last few years is nothing short of absolutely astounding.

jcgrillo 4 hours ago||

> the progress there in the last few years is nothing short of absolutely astounding

Yet, all the astounding progress notwithstanding, I don't have a suite of bespoke tools replacing the ones I depend on. I cannot say "hey claude, make me a suite of bespoke software infrastructure monitoring and operational tooling tailored to my specific needs" and expect anything more than a giant headache and wasted time. So maybe we just need to wait? Or maybe it's just not actually real. My view is unless you show me a working demo it's vaporware. Show me that the problem is solved, don't tell me that it might be solved later sometime.

user34283 2 hours ago||

And what exactly is preventing you from building bespoke software for "infrastructure monitoring and operational tooling tailed to your specific needs"?

I could certainly imagine building myself some sort of dashboard. It would seem like a prime use case.

You want to hear about a problem solved? Recently I extended a tool that snaps high resolution images to a Pixel art grid, adding a GUI. I added features to remove the background, to slice individual assets out of it automatically, and to tile them in 9-slice mode.

Could I have realistically implemented the same bespoke tool before AI? No.

jcgrillo 1 hour ago||

> And what exactly is preventing you from building bespoke software for "infrastructure monitoring and operational tooling tailed to your specific needs"?

Let's say I emit roughly 1TB of telemetry data per day--logs, metrics, etc. That's roughly what you might expect from medium sized tech company or a specific department (say, security) at a large company. There is going to be a significant infrastructure investment to replicate datadog's function in my organization, even if I only use a small subset of their product. It's not just "building a dashboard" it's building all the infrastructure to collect, normalize, store, and retrieve the data to even be able to draw that dashboard.

The dashboard is the trivial part. The hard part is building, operating, and maintaining all the infrastructure. Claude doesn't do a very good job helping with this, and in some sense it actually hinders.

EDIT: I'm not saying you shouldn't take ownership of your telemetry data. I think that's a strategically (and potentially from a user's perspective) better end result. But it is a mistake to trivialize the effort of that undertaking. Claude is not going to vibeslop it for you.

IAmGraydon 5 hours ago|||

This is a pipe dream and “sufficiently advanced” is doing a lot of heavy lifting. You really think people would rather spin up and debug their own self-made software rather than pay for something that has been tested, debugged, and proven by thousands of users? Why would anyone do that for anything more than a very simple script? It makes zero sense unless the LLM outputs literally perfect one-shot software reliably.

niemandhier 5 hours ago|||

Perplexity just launched a tool that builds and hosts small bespoke tools.

I tried it works wells. I can do the same thing in my Linux machine, but even my 12 year old now can get perplexity to build him a tool to compare ram prices at different chinease vendors.

qsera 4 hours ago||

Yes, LLMs can be a better search tool.

user34283 5 hours ago|||

It makes sense if you want bespoke software to do a specific job in a way best suited to your workflow.

Could you do the same in eg. Photoshop? Maybe, but even if, you would need to learn how.

program_whiz 4 hours ago||

Photoshop is a good example -- not that I agree with everything in the app, but just to design all the interactions properly in photoshop would take hundreds of hours (not to mention testing and figuring out the edges). If your goal is a 1-to-1 clone why not use Krita or photoshop? With LLM you'll get "mostly there" with many many hours of work, and lots of sharp edges. If all you need is paint bucket, basic brush / pencil, and save/load, ok maybe you can one-shot it in a few hours... or just use paint / aesprite...

zer00eyz 3 hours ago||

https://xkcd.com/1205/ (is it worth the time matrix)

LLM's change the calculus of the above chart dramatically.

hebrides 5 hours ago|

I’ve had a similar experience. I’ve been vibecoding a personal kanban app for myself. Claude practically one-shotted 90% of the core functionality (create boards, lanes, cards, etc.) in a single session. But after that I’ve now spent close to 30 hours planning and iterating on the remaining features and UI/UX tweaks to make the app actually work for me, and still, it doesn’t feel "ready" yet. That’s not to say it hasn’t sped up the process considerably; it would’ve taken me hours to achieve what Claude did in the first 10 minutes.

lelanthran 4 hours ago||

I've got a few projects I've generated, along with a wholly handwritten project started in Dec.

The difference I've noticed is that the act of actually typing out code made me backtrack a few times refining the possible solutions before even starting the integration tests, sometimes before even doing a compile.

When generating, the LLM never backtracked, even in the face of broken tests. It would proceed to continue band-aiding until everything passed. It would add special exceptions to general code instead of determining that the general rule should be refined or changed.

The reason that some devs are reporting 10x productivity is because a bunch of duct-taped, band-aided, instant-legacy code is acceptable. Others who dont see that level of productivity increase are spending time fixing the code to be something they can read.

Not sure yet if accepting the spaghetti is the right course. If future LLMs can understand this spaghetti then theres no point in good code. If we still need human coders, then the productivity increase is very small.

qsera 4 hours ago||

> It would add special exceptions to general code instead of determining that the general rule should be refined or changed.

That is pretty bad..

More comments...