Posted by dbalatero 9/3/2025
On the other hand, I do understand that the things the LLMs are really great at is not actually all that spectacular to monetize ... and so as a result we have all these snake oil salesmen on every corner boasting about nonsensical vibecoding achievements, because that's where the real money would be ... if it were really true ... but it is not.
The reason it doesn't show up online is that I mostly write software for myself and for work, with the primary goal of making things better, not faster. More tooling, better infra, better logging, more prototyping, more experimentation, more exploration.
Here's my opensource work: https://github.com/orgs/go-go-golems/repositories . These are not just one-offs (although there's plenty of those in the vibes/ and go-go-labs/ repositories), but long-lived codebases / frameworks that are building upon each other and have gone through many many iterations.
I generate between 10-100k lines of code per day these days. But is that a measure of productivity? Not really...
That’s absolute nonsense.
"I'm a person who hates art now...I never want to see art again. All I want to see is like, AI stuff. That's how bad it's gotten. Handmade? nuh-uh. Handmade code? ... anything by humans, just over. I'm just gonna watch pixels."
https://www.youtube.com/live/APkR4qRg1vM?si=XLGmH9uEjG08q-6x...
I watched a little more but was, uh, not impressed.
But if llms show us one thing, it’s how bad our code review tools are. I have a set of tree sitter helpers that allow me to examine different parts of a PR more easily (one that allows me to diff semantic parts of the code, instead of “files” and “lines”, one that gives me stats on what subsystems are touched and crosscorrelation of different subsystems, one for attaching metadata and which documents are related to a commit, one for managing our design documents, llm-coding intermediary documents, long lasting documents, etc… the proper version of these are for work but here’s the initial yolo from Manus: https://github.com/go-go-golems/vibes/tree/main/2025-08-22/p... https://github.com/go-go-golems/vibes/tree/main/2025-08-22/c... https://github.com/go-go-golems/vibes/tree/main/2025-08-15/d... https://github.com/go-go-golems/vibes/tree/main/2025-07-29/p...).
I very often put some random idea into the llm slot machine that is manus, and use the result as a starting point to remold it into a proper tool, and extracting the relevant pieces as reusable packages. I’ve got a pretty wide treesitter/lsp/git based set of packages to manage llm output and assist with better code reviews.
Also, every llm PR comes with _extensive_ documentation / design documents / changelogs, by the nature of how these things work, which helps both humans and llm-asssisted code review tools.
“ Write a streaming go yaml parsers based on the tokenizer (probably use goccy yaml if there is no tokenizer in the standard yaml parser), and provide an event callback to the parser which can then be used to stream and print to the output.
Make a series of test files and verify they are streamed properly.”
This is the slot machine. It might work, it might be 50% jank, it might be entire jank. It’ll be a few thousand lines of code that I will skim and run. In the best case, it’s a great foundation to more properly work on. In the worst case it was an interesting experiment and I will learn something about either prompting Manus, or streaming parsing, or both.
I certainly won’t dedicate my full code review attention to what was generated. Think of it more as a hyper specific google search returning stackoverflow posts that go into excruciating detail.
https://chatgpt.com/share/68b98724-a8cc-8012-9bee-b9c4a77fe9...
Also, a good chunk of my personal OSS projects are AI assisted. You probably can't tell from looking at them, because I have strict style guides that suppress the "AI style", and I don't really talk about how I use AI in the READMEs. Do you also expect I mention that I used Intellisense and syntax highlighting too?
For example, I've built 5-6 iphone apps, but they're kind of one-offs and I don't know why I would put them up on the app store, since they only scratch my own itches.
But if we expect the ratio of this sort of private code to publicly-released code to remain relatively stable, which I think is a reasonable expectation, then we'd expect there to be a proportional increase in both private and public code as a result of any situation that increased coding productivity generally.
So the absence of a notable increase in the volume of public code either validates the premise that LLMs are not actually creating a general productivity boost for software development, or instead points to its productivity gains being concentrated entirely in projects that never do get released, which would raise the question of why that might be.
Do internal, narrow purpose dev tools count as shipped code?
I linked some examples higher up, but I've been maintaining a lot of packages that I started slightly before chatgpt and then refactored and worked on as I progressively moved to the "entirely AI generated" workflow I have today.
I don't think it's an easy skill (not saying that to make myself look good, I spent an ungodly amount of time exploring programming with LLMs and still do), akin to thinking at a strategic level vs at a "code" level.
Certain design patterns also make it much easier to deal with LLM code: state reducers (redux/zustand for example), event-driven architectures, component-based design systems, building many CLI tools that the agent can invoke to iterate and correct things, as do certain "tools" like sqlite/tmux (by that I mean just telling the LLM "btw you can use tmux/sqlite", you allow it to pass hurdles that would otherwise just make it spiral into slop-ratatouille).
I also think that a language like go was a really good coincidence, because it is so amenable to LLM-ification.
If someone builds a faster car tomorrow, I am not going to go to the office more often.
No, but I expect my software to have been verified for correctness, and soundness by a human being with a working mental model of how the code works. But, I guess that's not a priority anymore if you're willing to sacrifice $2400 a year to Anthropic.
> No, but I expect my software to have been verified for correctness, and soundness by a human being with a working mental model of how the code works.
This is not exclusive with AI tools:
- Use AI to write dev tools to help you write and verify your handwritten code. Throw the one-off dev tools in the bin when you're done.
- Handwrite your code, generate test data, review the test data like you would a junior engineer's work.
- Handwrite tests, AI generate an implementation, have the agent run tests in a loop to refine itself. Works great for code that follows a strict spec. Again, review the code like you would a junior engineer's work.
If I’m working against a deadline I feel more comfortable spending time on research and design knowing I can spend less time on implementation. In the end, it took the same amount of time, though hopefully with an increase of reliability, observability, and extendibility. None of these things show up in the author’s faulty dataset and experiment.
There are many reasons for your experience, and I am glad you are having them! That's great!
But the fact remains, overall we aren't seeing an exponential or even step function in how much software is being delivered!
At this point, one is gaining with each model release or they are not.
Lets see in 2035 who was right and who was wrong. My bet is the people who are not gaining right now are not going to like the situation in 2035.
https://github.com/go-go-golems/ai-in-action-app/blob/main/c...
Where i expect to see a lot of those metrics of feeling fast come from, is from people who may have less coding experience, and with AI are coding way above their level.
My brother in law asks for a nice product website, i just feed his business plan into a LLM, do some fine tuning on the results, and have a good looking website in a hour time. If i did it myself manually, just take me behind a barn as those jobs are so boring and take for ages. But i know that website design is a weakness of mine.
That is the power of LLMs. Turn out quick code, maybe offer some suggestion you did not think about, but ... it also eats time! Making your prompts so that the LLM understands, waiting for the result, ... waiting ... ok, now check the result, can you use it? O no, it did X, Y, Z wrong. Prompt again ... and again. And this is where your productivity goes to die.
So when you compare a pool of developer feedback, your going to get a broad "it helps a lot", "some", "is worse then my code", ... mix in with the prompting, result delays etc...
It gets even worse with Agent / Vibe coding, as you just tend to be waiting, 5, 10min for changed to be done. You need to review them, test them, ... o no, the LLM screwed something up again. O no, it removed 50% of my code. Hey, where did my comments go. And we are back to a loss of time.
LLMs are a tool... But after a lot of working with them, my opinion is to use them when needed but do not depend on them for everything. I sometimes look with cow eyes when people say they are coding so much with LLMs and spending 200, or more bucks per month.
They can be powerful tools, but i feel that some folks become so over dependent on them. And worst is my feeling that our juniors are going to be in a world of hurt, if their skills are more LLM monkey coding (or vibe coding), then actually understanding how to code (and the knowledge behind the actual programming languages and systems).
Let's take the following scenario for the sake of argument: a codebase with well-defined AGENTS.md, referencing good architecture, roadmap, and product documentation, and with good test coverage, much of which was written by an LLM and lightly reviewed and edited by a human. Let's say for the sake of argument that the human is not enjoying 10x productivity despite all this scaffolding.
Is it still worthwhile to use LLM tooling? You know what, I think a lot of companies would say yes. There are way too many companies whose codebases lack testing and documentation, that are too difficult to on-board new engineers and have too high risk if the original engineers are lost. The simple fact that LLMs, to be effective, force the adaptation of proper testing and documentation is a huge win for corporate software.
Yes, but the point of this article is surely that on average if it's working, there would be obvious signs of it working by now.
Even if there are statistical outliers (ie. 10x productivity using the tools), if on average, it does nothing to the productivity of developers, something isn't working as promised.
When I tried to code again, I found I didn't really have the patience for it -- having to learn new frameworks, APIs, languages, tricky little details, I used to find it engrossing: it had become annoying.
But with tools like Claude Code and my knowledge about how software should be designed and how things should work, I am able to develop big systems again.
I'm not 20% more productive than I was. I'm not 10x more productive than I was either. I'm infinity times more productive because I wouldn't be doing it at all otherwise, realistically: I'd either hire someone to do it, or not do it, if it wasn't important enough to go through the trouble to hire someone.
Sure, if you are a great developer and spend all day coding and love it, these tools may just be a hindrance. But if you otherwise wouldn't do it at all they are the opposite of that.
Others are for start-ups that are pre-money, pre-revenue where I can build things myself without having to deal with hiring people.
In a larger organization, certainly I'd delegate to other people, but if it's just for me or new unfunded start-ups, this is working out very well.
And it's not that I "can no longer program". I could program, it's just that I don't find the nuts and bolts of it as interesting as I used to and am more focused on functionality, algorithm, and UI.
I believe it's a productivity boost, but only to a small part of my job. The boost would be larger if only had to build proof-of-concepts or hobby projects that don't need to be reliable in prod, and don't require feedback and requirements from many other people.
Just spit balling here, but it sure feels similar.
I have the same experience as OP, I use AI every day including coding agents, I like it, it's useful. But it's not transformative to my core work.
I think this comes down to the type of work you're doing. I think the issue is that most software engineering isn't in fields amenable to shovelware.
Most of us either work in areas where the coding is intensely brownfield. AI is great but not doubling anyone's productivity. Or, in areas where the productivity bottlenecks are nowhere near the code.
If AI were really making people 10x more productive, given the number of people who want to make games, you’d expect to see more than a few percent increase year over year.
That said, I’m skeptical that AI is as helpful for commercial software. It’s been great for in automating my workflow because I suck at shell scripting and AI is great at it. But most of the code I write I honestly halfway don’t know what I’m going to write until I write it. The prompt itself is where my thinking goes - so the time savings would be fairly small, but I also think I’m fairly skilled (except at scripting).
The same people who are willing to go through all the steps to release an application online are also willing to go through the extra effort of writing their own code. The code is actually the easy part compared to the rest of it... always has been.