Top
Best
New

Posted by dbalatero 9/3/2025

Where's the shovelware? Why AI coding claims don't add up(mikelovesrobots.substack.com)
765 points | 482 commentspage 7
vFunct 9/3/2025|
From the post, if AI was supposed to make everyone 25% more productive, then a 4 month project becomes a 3 month project. It doesn't become a 1 day project.

Was the author making games and other apps in 30 hours? Because that seems like a 4 month project?

sarchertech 9/4/2025|
The author mentioned polls showing that a substantial number of developers believe that AI makes them 10x more productive.
groby_b 9/3/2025||
I think the author misses a few points

* METR was at best a flawed study. Repo-familiarity and tool-unfamiliarity being the biggest points of critique, but far from the only one

* they assume that all code gets shipped as a product. Meanwhile, AI code has (at least in my field of view) led to a proliferation of useful-but-never-shipped one-off tools. Random dashboards to visualize complex queries, scripts to drive refactors, or just sheer joy like "I want to generate an SVG of my vacation trip and consume 15 data sources and give it a certain look".

* Their own self-experiment is not exactly statistically sound :)

That does leave the fact that we aren't seeing AI shovelware. I'm still convinced that's because commercially viable software is beyond the AI complexity horizon, not because AI isn't an extremely useful tool

ModernMech 9/4/2025|
> * METR was at best a flawed study.

They didn't claim it was flawless, they had just brought it up because it caused them to question their internal narrative of their own productivity.

> * Their own self-experiment is not exactly statistically sound :)

They didn't claim it was.

> * they assume that all code gets shipped as a product.

The author did not assume this. They assumed that if AI is making developers more productive, that should apply to shovelware developers. That we don't see an increase in shovelware post-AI, makes it very unlikely AI brings an increase in productivity for more complex software.

cookiengineer 9/4/2025||
It's much much worse in the Cybersecurity field. I wanted to share the anecdote here, too, because it's kind of fitting.

Somehow, in cyber, everyone believes that transformers will generate better answers than not to use the 10 most common passwords. It's like the whole knowledge about decision making theory, neural nets, GANs, LSTMs etc completely got wiped out and forgotten within less than 10 years.

I understand the awesomeness of LLMs while debugging and forensics (they are a really good rubberduck!), but apart from that they're pretty much useless because after two prompts they will keep forgetting if/elseif/else conditions, and to check those boundaries is the mission of the unlucky person that has to merge that slopcode later.

I don't understand how we got from TDD and test case based engineering to this bullshit. It's like everyone in power was the wrong person to be in that position in the first place, and that statistically no lead engineers ever will be a C-staff or SVP or whatever corporate manager level.

While the AI bubble is bursting, I will continue to develop with TDD practices to test my code. Which, in return, has the benefit of being able to use LLMs to create nice templates as a reasonable starting point.

yerushalayim 6 days ago||
Working with AI may not save you time, but when used properly, it will almost assuredly get you better results.
Vanclief 9/3/2025||
While I like the self reflection from this article, I don't think his methodology adds up (pun intended). First there are two main axis where LLMs can make you more productive: speed & code quality. I think everyone is obsessed about the first one, but its less relevant.

My personal hypothesis is that when using LLMs, you are only faster if you would be doing things like boilerplate code. For the rest, LLMs don't really make you faster but can make your code quality higher, which means better implementation and caching bugs earlier. I am a big fan of giving the diff of a commit to an LLM that has a file MCP so he can search for files in the repo and having it point any mistakes I have made.

ksenzee 9/3/2025||
This doesn’t match my experience. I needed a particularly boilerplate module the other day, for a test implementation of an API, so I asked Gemini to magic one up. It was fairly solid code; I’d have been impressed if it had come from a junior engineer. Unfortunately it had a hard-to-spot defect (an indentation error in an annotation, which the IDE didn’t catch on paste), and by the time I had finished tracking down the issue, I could have written the module myself. That doesn’t seem to me like a code quality improvement.
malfist 9/3/2025|||
I don't know what world you're living in, but quality code isn't a forte of ai
Chinjut 9/4/2025||
For what it's worth, this response is explicitly anticipated in the article.
notlisted 9/4/2025||
Big Meh. Bad metric. Phone apps were dead long before Ai came about. Shovelware double so.

Most users have 40-80 apps installed and use 9 a day, 20 a month(1). The shitty iOS subscription trend killed off the hobby of 'app collecting'.

Have I created large commercial Ai-coded projects? No. Did I create 80+ useful tools in hours/days that I wouldn't have otherwise? Hellz yeah!

Would I publish any of these on public github? Nope! I don't have the time nor the inclination to maintain them. There's just too many.

My shovelware "Apps" reside on my machine/our intranet or V0/lovable/bolt. Roughly ~25% are in active daily use on my machine or in our company. All tools and "apps" are saving us many hours each week.

I'm also rediscovering the joy of coding something useful, without writing a PRD for some intern. Speaking of which. We no longer have an intern.

(1) https://buildfire.com/app-statistics/

pdntspa 9/4/2025||
I too have been wondering whether the time I spend wrangling AI into getting it to do what I want, is greater than the time I'd spend if I just did it myself
curtisblaine 9/4/2025||
Maybe the hard part is not coding, but making useful software (as in: software that other people would use) and LLM can't help that.
Aeolun 9/3/2025||
Hmm, I definitely have more issues with AI generated code that I wouldn’t have if I did it all manually, but the lack of typing may make up for the lost time itself.
carpo 9/4/2025|
Maybe developers are using it in a less visible way? In the past 6 months I've used AI for a lot of different things. Some highlights:

- Built a windows desktop app that scans local folders for videos and automatically transcribes the audio, summarises the content into a structured JSON format based on screenshots and subtitles, and automatically categorises each video. I used it on my PC to scan a couple of TB of videos. Has a relatively nice interface for browsing videos and searching and stores everything locally in SQLite. Did this in C# & Avalonia - which I've never used before. AI wrote about 75% of the code (about 28k LOC now).

- Built a custom throw-away migration tool to export a customers data from one CRM to import into another. Windows app with basic interface.

- Developed an AI process for updating a webform system that uses XML to update the form structure. This one felt like magic and I initially didn't think it would work, but it only took a minute to try. Some background - years ago I built a custom webform/checklist app for a customer. They update the forms very rarely so we never built an interface for making updates but we did write 2 stored procs to update forms - one outputs the current form as XML and another takes the same XML and runs updates across multiple tables to create a new version of the form. For changes, the customer sends me a spreadsheet with all the current form questions in one column and their changes in another. It's normally just wording changes so I go through and manually update the XML and import it, but this time they had a lot of changes - removing questions, adding new ones, combining others. They had a column with the label changes and another with a description of what they wanted (i.e. "New Question", "Update label", "Combine this with q1, q2 and q3", "remove this question"). The form has about 100 questions and the XML file is about 2500 lines long and defines each form field, section layout, conditional logic, grid display, task creation based on incorrect answers etc, so it's time consuming to make a lot of little changes like this. With no expectation of it working, I took a screenshot of the spreadsheet and the exported XML file and prompted the LLM to modify the XML based on the instructions in the spreadsheet and some basic guidelines. It did it close to perfect, even fixing the spelling mistakes the customer had missed while writing their new questions.

- Along with using it on a daily basis across multiple projects.

I've seen the stat that says developers "...thought AI was making them 20% faster, but it was actually making them 19% slower". Maybe I'm hoodwinking myself somehow, but it's been transformative for me in multiple ways.

bad_username 9/4/2025|
What did you use for transcription? Local whisper via ffmpeg?
carpo 9/4/2025||
Yeah, the app lets you configure which whisper model to use and then downloads it on first load. Whisper blows me away too. Ive only got a 2080 and use the medium model and it's surprisingly good and relatively fast.
More comments...