Posted by briankelly 1 day ago
All of it smells of a (lousy) junior software engineer: from configuring root logger at the top, module level (which relies on module import caching not to be reapplied), over not using a stdlib config file parser and building one themselves, to a raciness in load_json where it's checked for file existence with an if and then carrying on as if the file is certainly there...
In a nutshell, if the rest of it is like this, it simply sucks.
Also…
def _save_current_date_time(current_date_time_file: str, current_date_time: str) -> None:
with Path(current_date_time_file).open("w") as f:
f.write(current_date_time)
there is a lot of obviously useful abstraction being missed, wasting lines of code that will all need to be maintained.The scary thing is: I have seen professional human developers write worse code.
I far from a heavy LLM coder but I’ve noticed a massive excess of unnecessary comments in most output. I’m always deleting the obvious ones.
But then I started noticing that the comments seem to help the LLM navigate additional code changes. It’s like a big trail of breadcrumbs for the LLM to parse.
I wouldn’t be surprised if vibe coders get trained to leave the excess comments in place.
Anyway, I don't think any of the current LLMs are really good for coding. What it's good at is copy-pasting (with some minor changes) from the massive code corpus it has been pre-trained. For example, give it some Zig code and it's straight unable to solve even basic tasks. Same if you give it really unique task, or if you simply ask for potential improvements of your existing code. Very, very bad results, no signs of out-of-box thinking whatsoever.
BTW: I think what people are missing is that LLMs are really great at language modeling. I had great results, and boosts in productivity, just by being able to prepare the task specification, and do quick changes in that really easily. Once I have a good understanding of the problem, I can usually implement everything quickly, and do it in much much better way than any LLM can currently do.
It didn't work out that great. I think that all the context in the verbose coding it does actually helps it to write better code. Shedding context to free up tokens isn't so straightforward.
the dev is just lazy to not include them anymore, wheres the model doesn't really need to be lazy, as paid by the token
It’s trivial to ask Claude via Cursor to add comments to illustrate how some code works. I’ve found this helpful with uncommented code I’m trying to follow.
I haven’t seen it hallucinate an incorrect comment yet, but sometimes it will comment a TODO that a section should be made more more clear. (Rude… haha)
This is a human sentiment because we can fairly easily pick up abstractions during reading. AIs have a much harder time with this - they can do it, but it takes up very limited cognitive resources. In contrast, rewriting the entire software for a change is cheap and easy. So to a point, flat and redundant code is actually beneficial for a LLM.
Remember, the code is written primarily for AIs to read and only incidentally for humans to execute :)
This is kind of the rub of it all. If the code works, passes all relevant tests, is reasonably maintainable, and can be fitted into the system correctly with a well defined interface, does it really matter? I mean at that point its kind of like looking at the output of a bytecode compiler and being like "wow what a mess". And it's not like they can't write code up to your stylistic standards, it's just literally a matter of prompting for that.
You're not wrong here, but there's a big difference in programming one-off tooling or prototype MVPs and programming things that need to be maintained for years and years.
We did this song and dance pretty recently with dynamic typing. Developers thought it was so much more productive to use dynamically typed languages, because it is in the initial phases. Then years went by, those small, quick-to-make dynamic codebases ended up becoming unmaintainable monstrosities, and those developers who hyped up dynamic typing invented Python/PHP type hinting and Flow for JavaScript, later moving to TypeScript entirely. Nowadays nobody seriously recommends building long-lived systems in untyped languages, but they are still very useful for one-off scripting and more interactive/exploratory work where correctness is less important, i.e. Jupyter notebooks.
I wouldn't be surprised to see the same pattern happen with low-supervision AI code; it's great for popping out the first MVP, but because it generates poor code, the gung-ho junior devs who think they're getting 10x productivity gains will wisen up and realize the value of spending an hour thinking about proper levels of abstraction instead of YOLO'ing the first thing the AI spits out when they want to build a system that's going to be worked on by multiple developers for multiple years.
You'll notice the type systems being bolted onto dynamic languages or found in serious attempts at new languages are radically different than the type systems being rejected by the likes of javascript, python, ruby and perl.
In my experience, type checking / type hinting already starts to pay off when more than one person is working on an even small-ish code base. Just because it helps you keep in mind what comes/goes to the other guy's code.
A fairly incompetent one, in my experience. And don't even get me started on "me 3 months ago", that guy's even worse.
Me, looking at code 100% written by me last year.
I think this has a ton to do with the mixed results from "vibe coding" we've seen as the codebase grows in scope and complexity. Agents seem to break down without a good type system. Same goes for JS.
I've just recently started on an Objective-C project using Cline, and it's like nirvana. I can code out an entire interface and have it implemented for me as I'm going. I see no reason it couldn't scale infinitely to massive LOC with good coding practices. The real killer feature is header files. Being able to have your entire projects headers in context at all time, along with a proper compiler for debugging, changes the game for how agents can reason on the whole codebase.
Humans also worry about their jobs, especially in PIP-happy companies; they are very well known for writing intentionally over-complicated code that only they understand so that they are irreplaceable
For example, I've seen a C# application where every function takes in and outputs an array of objects, supposedly built that way so the internal code can be modified without ever having to worry about the contract breaking. It was just as bad as you are imagining, probably worse. Was that incompetence or building things to be so complicated that others would struggle to work on it?
Also I find "AI makes crap code so we should give it a bigger task" illogical.
This is my issue with all the AI naysayers at this point. It seems to all boil down to "haha, stupid noob can't code so he uses AI" in their minds. It's like they are incapable of understanding that there could simultaneously be a bunch of junior devs pushing greenfield YouTube demos of vibe coding, while at the same time expert software engineers are legitimately seeing their productivity increase 10x on serious codebases through judicious use.
Go ahead and keep swinging that hammer, John Henry.
It's funny you would say this, because we are really commenting on an article where a self-proclaimed "expert" has done that and the "10x" output is terrible.
I'm working in the field professionally since June 1998, and among other things, I was the tech lead on MyHammer.de, Germany's largest craftsman platform, and have built several other mid-scale online platforms over the decades.
How well I have done this, now that's for others to decide.
Quite objectively though, I do have some amount of experience — even a bad developer probably cannot help but pick up some learnings over so many years in relevant real-world projects.
However, and I think I stated this quite clearly, I am expressively not an expert in Python.
And yet, I could realize an actually working solution that solves an actual problem I had in a very real sense (and is nicely humming away for several weeks now).
And this is precisely where yes, I did experience a 10x productivity increase; it would have certainly taken me at least a week or two to realize the same solution myself.
I don't doubt this is doing something useful for you. It might even be mostly correct.
But it is not a positive advertisement for what AI can do: just like the code is objectively crap, you can't easily trust the output without a comprehensive review. And without doubting your expertise, I don't think you reviewed it, or you would have caught the same smells I did.
What this article tells me is that when the task is sufficiently non-critical that you can ignore being perfectly correct, you can steer AI coding assistants into producing some garbage code that very well might work or appear to work (when you are making stats, those are tricky even with utmost manual care).
Which is amazing, in my opinion!
But not what the premise seems to be (how a senior will make it do something very nice with decent quality code).
Out of curiosity why did you not build this tool in a language you generally use?
And if I cannot bring language-proficiency to the table — which of my capabilities as a seasoned software&systems guy can I put to use?
In the brown-field projects where my team and I have the AI implement whole features, the resulting code quality — under our sharp and experienced eyes — tends to end up just fine.
I think I need to make the differences between both examples more clear…
However, your writing style implied that the result was somehow better because you were otherwise an experienced engineer.
Even your clarification in the post sits right below your statement how your experience made this very smooth, with no explanation that you were going to be happy with bad code as long as it works.
However. I‘m not quite sure where I complained. Certainly not in the post.
And yes, I’m very convinced that the result turned out a lot better than it would have turned out if an unexperienced „vibe coder“ had tried to achieve the same end result.
Actually pretty sure without my extensive and structured requirements and the guard rails, the AI coding session would have ended in a hot mess in the best case, and a non-functioning result in the worst case.
I‘m 100% convinced that these two statements are true and relevant to the topic:
That a) someone lacking my level of experience and expertise is simply not capable of producing a document like https://github.com/dx-tooling/platform-problem-monitoring-co...
And that b) using said document as the basis for the agent-powered AI coding session had a significant impact on the process as well as the end result of the session.
Btw, AI doesn't just code, there are AIs for debugging, monitoring etc too.
1. Tooling obviously does improve performance, but not so huge a margin. Yes, if AI could automate more elements of tooling, that would very much help. If I could tell an AI "bisect this bug, across all projects in our system, starting with this known-bad point", that would be very helpful -- sometimes. And I'm sure we'll get there soon enough. But there is fractal complexity here: what if isolating the bug requires stepping into LLDB, or dumping some object code, or running with certain stressors on certain hardware? So it's not clear that "LLM can produce code from specs, given tight oversight" will map (soon) to "LLM can independently assemble tools together and agentically do what I need done".
2. Even if all tooling were automated, there's still going to be stuff left over. Can the LLM draft architectural specs, reach out to other teams (or their LLMs), sit in meetings and piece together the big picture, sus out what the execs really want us to be working on, etc.? I do spend a significant (double-digit) percentage of my time working on that, so if you eliminate everything else -- then you could get 10x improvement, but going beyond that would start to run up against Amdahl's Law.
No, coding speed is really not the bottleneck to software engineer productivity.
No one said productivity is this one thing and not that one thing, only you say that because it's convenient for your argument. Productivity is a combination of many things, and again it's not just typing out code that's the only area AI can help.
Again, the context here was that somebody discussed speed of coding and you raised the point of not using any tooling with Notepad.
But that's not what we get in this early stage of grifting. We get 10% marketing buzz on how cool this is with stuff that cannot be recreated in the tool alone, and 89% of lazy or inexperienced developers who just turn in slop with little or no iteration. The latter don't even understand the code they generated.
That 1% will be amazing, it's too bad the barrel is full of rotten apples hiding that potential. The experts also tend to keep to themselves, in my experience. the 89% includes a lot of dunning-kruger as well which makes those outspoken experts questionable (maybe a part of why real experts aren't commenting on their experience).
Iin this case, I did put in the guard rails to ensure that I reach my goal in hopefully a straight line and a quickly as possible, but to be honest, I did not give much thought to long-term maintainability or ease of extending it with more and more features, because I needed a very specific solution for a use case that doesn't change much.
I'm definitely working differently in my brown-field projects where I'm intimately familiar with the tech stack and architecture — I do very thorough code reviews afterwards.
People have different definitions of "reasonably maintainable", but if code has extra stuff that provides no value, it always perplexes the reader (what is the point of this? what am I missing?), and increases cognitive load significantly.
But if AI coding tools were advertised as "get 10x the output of your least capable teammate", would they really go anywhere?
I love doing code reviews as an opportunity to teach people. Doing this one would suck.
That's not the scary part. It's the honest part. Yes, we all have (vague) ideas of what good code looks like, and we might know it when we see it but we know what reality looks like.
I find the standard to which we hold AI in that regard slightly puzzling. If I can get the same meh-ish code for way less money and way less time, that's a stark improvement. If the premise is now "no, it also has to be something that I recognize as really good / excellent" then at least let us recognize that we have past the question if it can produce useful code.
But whenever someone advertises how an expert will benefit from it yet they end up with crap, it's a different discussion.
As an expert, I want AI to help me produce code of similar quality faster. Anyone can find a cheaper engineer (maybe five of them?) that can produce 5-10x the code I need at much worse quality.
I will sometimes produce crappy code when I lack the time to produce higher quality code: can AI step in and make me always produce high quality code?
That's a marked improvement I would sign up for, and some seem to tout, yet I have never seen it play out.
In a sense, the world is already full of crappy code used to build crappy products: I never felt we were lacking in that department.
And I can't really rejoice if we end up with even more of it :)
With AI they can simply blame whatever model they used and continually shovel trash out there instantly.
If you're in a team where somebody can continuously commit trash without any repercussions, this isn't a problem caused by AI.
They’re very good at honing bad code into good code with good feedback. And when you can describe good code faster than you can write it - for instance it uses a library you’re not intimately familiar with - this kind of coding can be enormously productive.
And they're very bad at keeping other code good across iterations. So you might find that while they might've fixed the specific thing you asked for—in the best case scenario, assuming no hallucinations and such—they inadvertently broke something else. So this quickly becomes a game of whack-a-mole, at which point it's safer, quicker, and easier to fix it yourself. IME the chance of this happening is directly proportional to the length of the context.
So, how did I end up with a logging.py, config.py, config in __init__.py and main.py? Well I prompted for it to fix the logging setup to use a specific format.
I use cursor, it can spit out code at an amazing rate and reduced the amount of docs I need to read to get something done. But after its second attempt at something you need to jump in and do it yourself and most likely debug what was written.
In some ways this is even more impressive -- every prompt you make, your LLM is in effect re-reading (and re-comprehending) your whole codebase, from scratch!
Perhaps there is simply too much crappy Python code around that they were trained on as Python is frequently used for "scripting".
Perhaps the field has moved on and I need to try again.
But looking at this, it would still be faster for me to type this out myself than go through multiple rounds of reviews and prompts.
Really, a senior has not reviewed this, no matter their language (raciness throughout, not just this file).
Every since ~4o models, there seems to be a pretty decent chance that you ask it to change something specific and it says it will and it spits out line for line identical code to what you just asked it to change.
I have had some really cool success with AI finding optimizations in my code, but only when specifically asked, and even then I just read the response as theory and go write it myself, often in 1-15% the LoC as the LLM
Note how it has invented the faster parameter for the zpool command. It is possible that the blog writer hallucinated a faster parameter themselves without needing a LLM - who knows.
I think all developers should add a faster parameter to all commands to make them run faster. Perhaps a LLM could create the faster code.
I predict an increase of man page reading, and better quality documentation at authoritative sources. We will also improve our skills at finding auth sources of docs. My uBlacklist is getting quite long.
The date can be spoofed. It first showed up on archive.org in December 2022, and there's no captures for the site before then, so I'm liable to believe the dates are spoofed.
I suspect they might actually have a pool named faster -- I know I've named pools similarly in the past. This is why I now name my pools after characters from the Matrix, as is tradition.
How useful is a library of knowledge when n% of the information is suspect? We're all about to find out.
But then again the old id doesn't match between the two commands.
def ensure_dir_exists(path: str) -> None:
"""
Ensure a directory exists.
Args:
path: Directory path
"""
An extremely useful and insightful comment. Then you look where it's actually used, # Ensure the directory exists and is writable
ensure_dir_exists(work_dir)
work_path = Path(work_dir)
if not work_path.exists() or not os.access(work_dir, os.W_OK):
... so like, the entire function and its call (and its needlessly verbose comment) could be removed because the existence of the directory is being checked anyway by pathlib.This might not matter here because it's a small, trivial example, but if you have 10, 50, 100, 500 developers working on a codebase, and they're all thoughtlessly slinging code like this in, you're going to have a dumpster fire soon enough.
I honestly think "vibe coding" is the best use case for AI coding, because at least then you're fully aware the code is throwaway shit and don't pretend otherwise.
edit: and actually looking deeper, `ensure_dir_exists` actually makes the directory, except it's already been made before the function is called so... sigh. Code reviews are going to be pretty tedious in the coming years, aren't they?
It will take some time tho, as decision makers will struggle to make up reasons why why noone on the payroll is able to fix production.
Software that’s well designed and architected is a pleasure to read and write, even if a lower quality version would get the job done. I’m watching one of the things I love most in the world become more automated and having the craftsmanship stripped out of it. That’s a bit over dramatic from me, but it’s been sad to watch.
I agree with you that it is sad. And what is especially sad is that the result will probably be lower quality overall, but much cheaper. It’s the inevitable result of automation.
Worse by far is still the ability of AI to really integrate different problems and combine them into a solution. And it also seems to depend on the language. In my opinion especially Python and JS results are often very mixxed while other languages with presumably a smaller training set might even fare better. JS seems often fine with asynchronous operation like that file check however.
Perhaps really vetting a training set would improve AIs, but it would be quite work intensive to build something like that. That would require a lot of senior devs, which is hard to come by. And then they need to agree on code quality, which might be impossible.
Saying that, for typed languages like TypeScript and C#, they have gotten very good. I suspect this might be related to the semantic information can be found in typed languages, and hard to follow unstructured blobs like dataframes, and there for, not well repeated by LLMs.
It's probably because spark is so backwards compatible with pandas, but not fully.
Explain the issue with load_json to me more. From my reading it checks if the file exists, then raises an error if it does not. How is that carrying on as if the file is certainly there?
The Python standard library has a configparser module, which should be used instead of custom code. It's safer and easier than manual parsing. The standard library also has a tomllib module, which would be an even better option IMO.
> However, my broad understanding of software architecture, engineering best practices, system operations, and what makes for excellent software projects made this development process remarkably smooth.
If the seniors are going to write this sort of Python code and then talk about how knowledge and experience made it smooth or whatever, might as well hire a junior and let them learn through trials and tribulations.
I asked $random_llm to give me code to recursively scan a directory and give me a list of file names relative to the top directory scanned and their sizes.
It gave me working code. On my test data directory it needed ... 6.8 seconds.
After 5 min of eliminating obvious inefficiencies the new code needed ... 1.4 seconds. And i didn't even read the docs for the used functions yet, just changed what seemed to generate too many filesystem calls for each file.
What if I had trusted the code? It was working after all.
I'm guessing that if i asked for string manipulation code it would have done something worth posting on accidentally quadratic.
600% improvement is worth what, 3 days of billable work if it lasts 5 minutes?
In such a place I should be a very loud advocate of LLMs, use them to generate 100% of my output for new tasks...
... and then "improve performance" by simply fixing all the obvious inefficiencies and brag about the 400% speedups.
Hmm. Next step: instruct the "AI" to use bubblesort.
Then you would have been done five minutes earlier? I mean, this sort of reads like a parody of microoptimization.
If you hadn't told me that I would also not have bothered optimizing syscalls.
Did you tell the AI the profiler results and ask for ways to make it faster?
Acting like a LLM now :P
> Did you tell the AI the profiler results and ask for ways to make it faster?
Looking for ways to turn a 10 minute job into a couple days?
> Acting like a LLM now :P
Hey, if we're going to be like that, it sure sounds like you gave the employee an incomplete spec so you could then blame it for failing. So... at least I'm not acting like a PM :P
Your brilliant AI calls another low level function to get the file size on the file name. (also did worse stuff but let's not go into details).
Do you call reading the file size from the in memory structure that you already have a speed optimization? I call it common sense.
It's so funny how these AI bros make excuse after excuse for glaring issues rather than just accept AI doesn't actually understand what it's doing (not even considering it's faster to just write good quality code on the first try).
Stuff that google search from 10 years ago would have done without pretending it's "AI". But not google search from this year.
* it wasn't able to simply list the fields of the returned structure that contained a directory entry. But since it gave me the name, i was able to look it up via plain search.
Its less funny when you realize how few of these people even have experience reading and writing code.
They just see code on screen, trust the machine and proclaim victory.
because that is what the market is trying to sell?
But the alternative would be the tool doesn't get built because the author doesn't know enough Python to even produce crappy code, or doesn't have the money to hire an awesome Python coder to do that for them.
While I would have hoped for a better result, I'm not surprised. In this particular case, I really didn't care about the code at all; I cared about the end result at runtime, that is, can I create a working, stable solution that solves my problem, in a tech stack I'm not familiar with?
(While still taking care of well-structured requirements and guard rails — not to guarantee a specific level of code quality per se, but to ensure that the AI works towards my goals without the need to intervene as much as possible).
I will spin up another session where I ask it to improve the implementation, and report back.
It's not a race. It's just redundant. If the file does not exist at the time you actually try to access it you get the same error with slightly better error message.
There are plenty of ways to structure code so this does not happen, but simply "do not do anything at the top module level" will ensure you don't hit these issues.
By the way, prompting models properly helps a lot for generating good code. They get lazy if you don't explicitly ask for well-written code (or put that in the system prompt).
It also helps immensely to have two contexts, one that generates the code and one that reviews it (and has a different system prompt).
This is insane on so many levels.
https://github.com/dx-tooling/platform-problem-monitoring-co...
Where things are placed in the project seems rather ad hoc too. Put everything in the same place kind of architecture. A better strategy might be to separate out the I and the O of IO. Maybe someone wants SMS or group chat notifications later on, instead of shifting the numbers in filenames step11_ onwards one could then add a directory in the O part and hook it into an actual application core.
There are idioms used when programming in BASIC on how to number the lines so you don't end up renumbering them all the time to make an internal change. It's interesting that such idioms are potentially applicable here also.
All said, it’s hard on me knowing it’s possible to use llm to spit out a crappy but functional version of whatever I’ve dreamt up with out satisfaction of building it. Yet, it also seems to now be demotivating to spend the time crafting it when I know I could use llm to do a majority of it. So, I’m in a mental quagmire, this past year has been the first year since at least 2000 that I haven’t built anything significant in scale. It’s indirectly ruining the fun for me for some reason. Kind of just venting but curious if anyone else feels this way too?
But I've been using Claude to help with all kinds of side projects. One recently was to help create and refine some python code to take the latest Wikipedia zipped XML file and transform/load it locally into a PostgreSQL DB. The initial iteration of the code took ~16 hours to unzip, process, and load into the database. I wanted it to be faster.
I don't know how to use multiple processes/multi-threading, but after some prompting, iterating, and persistent negotiations with Claude to refine the code (and an SSD upgrade) I can go from the 24gb zip file to all cleaned/transformed data in the DB in about 2.5 hours. Feels good man.
Do I need to know exactly what's happening in the code (or at lowers levels, abstracted from me) to make it faster? not really. Could someone who was more skilled, that knew more about multi-threading, or other faster programming languages, etc..., make it even faster? probably. Is the code dog shit? it may not be production ready, but it works for me, and is clean enough. Someone who better knew what they were doing could work with it to make it even better.
I feel like LLMs are great for brainstorming, idea generation, initial iterations. And in general can get you 80%+ the way to your goal, almost no matter what it is, much faster than any other method.
I take it as an opportunity to learn too. I'm working on a video library app that runs locally and wanted to extract images when the scene changed enough. I had no idea how to do this, and previously would have searched StackOverflow to find a way and then struggled for hours or days to implement it. This time I just asked Aider right in the IDE terminal what options I had, and it came back with 7 different methods. I researched those a little and then asked it to implement 3 of them. It created an interface, 3 implementations and a factory to easily get the different analyzers. I could then play around with each one and see what worked the best. It took like an hour. I wrote a test script to loop over multiple videos and run each analyzer on them. I then visually checked the results to see which worked the best. I ended up jumping into the code it had written to understand what was going on, and after a few tweaks the results are pretty good. This was all done in one afternoon - and a good chunk of that time was just me comparing images visually to see what worked best and tweaking thresholds and re-running to get it just right.
Example:
I'm building a tool to grab my data from different sites like Steam, Imdb, Letterboxd and Goodreads.
I know perfectly well how to write a parser for the Goodreads CSV output, but it doesn't exactly tickle my brain. Cursor or Cline will do it in minutes.
Now I've got some data to work with, which is the fun bit.
Again if I want to format the output to markdown for Obsidian, the LLM can do it in a few minutes and maybe even add stuff I didn't think about at first.
It's been depressing to listen to people pretend that LLM generated code is "the same thing". To trivialize the thoughtful lessons one has learned honing their craft. It's the same reason the Studio Ghilbi AI image trend gives me the ick.
I agree though that the Studio Ghibli trend feels off. To me, art like this feels different to code. I know that's probably heresy around these parts of the internet, and I probably would have said something different 15-20 years ago. I know that coding is creative and fulfilling. I think I've just had the fun of coding beat out of me over 25 years :) AI seems to be helping bring the fun back.
It also lets me explore other libraries and languages without having to invest too much time and effort before knowing if it's right for what I want to do. I know that if I want to continue in a different direction, I haven't wasted tons of time/effort and getting started in the new direction will be much less effortful and time consuming.
On the other hand, most tasks aren't fun nor satisfying and frankly they are a waste of time, like realizing you're about to spend the afternoon recredentializing in some aspect of Webpack/Gradle/BouncyCastle/Next.js/Combine/x/y/z just to solve one minor issue. And it's pure bliss when the LLM knows the solution.
I think the best antidote to the upset in your comment is to build bigger and more difficult things. Save your expertise for the stuff that could actually use your expertise rather than getting stuck wasting it on pure time burn like we had to in the past.
LLMs can one-shot pretty good server-authority + client-prediction + rollback netcode, something I've probably spent weeks of my life trying to build and mostly failing. And they can get a basic frontend 'proof' working. And once you verify that the networked MVP works, you can focus on the game.
But the cool thing about multiplayer games is that they can be really small in scope because all of the fun comes from mechanics + playing with other people. They can be spaceships shooting at each other in a single room or some multiplayer twist on a dumbed down classic game. And that's just so much more feasible than building a whole game that's expected to entertain you as a single player.
I'm pragmatic and simply don't trust current LLM's to do much in my domain. All that tribal knowledge is kept under lock and key at studios, so good luck scraping the net to find more than the very basic samples of how to do something. I've spent well over a decade doing that myself; the advanced (and even a lot of intermediate) information is slim and mostly behind paywalls or books.
So far none of them are having a great time after their initial enthusiasm. A lot of it is people discovering that there’s far more to a business than whipping up a SaaS app that does something. I’m also seeing a big increase in venting about how their progress is slowing to a crawl as the codebase gets larger. It’s interesting to see the complaints about losing days or weeks to bugs that the LLM introduced that they didn’t understand.
I still follow because it’s interesting, but I’m starting to think 90% of the benefit is convincing people that it’s going to be easy and therefore luring them into working on ideas they’d normally not want to start.
I had long ago culled many of those ideas based on my ability to execute the marketing plan or the “do I really even want to run that kind of business?” test. I already knew I could build whatever I wanted to exist so My days of pumping out side projects ended long ago and I became more selective with my time.
a project in a niche where I live and breath the fumes off the work and I can help the whole ecosystem with their workflow? sign me up!
This reminds me of the famous HN comment when Drew Houston first announced Dropbox here in 2007: https://news.ycombinator.com/item?id=9224
thankfully, I'm not important enough for my comment to amount to the same thing.
Any SaaS business. In a week. And to be a "serious contender", you have to have feature parity. Yet now you're shifting the goalposts.
What's stopping you? There are 38 weeks left in 2025. Please build "serious contenders" for each of the top 38 most popular SaaS products before the end of the year. Surely you will be the most successful programmer to have ever lived.
My claim is that in a week you could build a thing that people want to use, as long as you can sell it, that's competitive with existing options for a given client. Salesforce is a CRM with walled gardens after walled garden. access to each of which costs extra, of course. they happened to be in the right place at the right time, with the right bunch of assholes.
A serious contender doesn’t have to start with everything. It starts by doing the core thing better—cleaner UX, clearer value, easier to extend. That’s enough to matter. That’s enough to grow.
I’m not claiming to replace decades overnight. But momentum, clarity, and intent go a long way. Especially when you’re not trying to be everything to everyone—just the right thing for the right people.
as for Spotify: https://bit.ly/samson_music
I'm not sure what you are trying to say here - that this website is comparable to Spotify? Even if you are talking about just the "core experience", this example supports the opposite argument that you are trying to make.
Spotify has the licensing rights to songs and I don't have the business acumen to go about getting those rights, so I guess I could make Pirate Spotify and get sued by the labels for copyright infringement, but that would just be a bunch of grief for me which would be not very fun and why would I want to screw artists out of getting paid to begin with?
i think ive detected the root cause of your problem.
and, funnily enough, it goes a long way to explaining the experiences of some other commentators in this thread on “vibe coding competitive SaaS products”.
Spotify is not the audio player widget in some user interface. It started off as a Torrent-like P2P system for file distribution on top of a very large search index and file storage. That's the minimum you'd build for a "whitelabel [...] Spotify clone". Since then they've added massive, sophisticated systems for user monitoring and prediction, ad distribution, abuse and fraud detection, and so on.
Use that code generation platform to build a product off any combination of two of the larger subsystems at Spotify and you're set for retirement if you only grab a reasonable salesperson and an accountant off the street. Robust file distribution with robust abuse detection or robust ad distribution or robust user prediction would be that valuable in many business sectors.
If building and maintaining actually is that effortless for you, show some evidence.
I'm listening. I fully admit that I was looking at Spotify as a user and thus only as a music playing widget so I'd love to hear more about this side of things. What is user prediction?
You can find out quite a lot in their blogs and publications:
https://research.atspotify.com/2022/02/modeling-users-accord...
As far as knowledge/experience, I worry about a day where "vibe coding" takes over the world and it's only the greybeards that have any clue WTF is going on. Probably profitable, but also sounds like a hellscape to me.
I would hate to be a junior right now.
I am not going to spend half an hour coming up with that prompt, tweaking it, and then spend many hours (on the optimistic side) to track down all the hallucinated code and hidden bugs. Have been there once, never going to do that again.
I'd rather do it myself to have a piece of mind.
I skimmed the vibecoding subreddits for a while. It was common to see frustrations about how coding tools (Cursor, Copilot, etc) were great last month but terrible now. The pattern repeats every month, though. When you look closer it’s usually people who were thrilled when their projects were small but are now frustrated when they’re bigger.
Gemini 2.5 is much better in this regard, it can make decent output up to around 100k tokens compared to claude 3.7 starting to choke around 32k. Long term it remains to see if this will remain an issue. If models can get to 5M context and perform like current model with 5k context, it would be a total game changer.
On Greenfield projects there's simply too many options for it to pursue. It will take one approach in one place then switch to another.
On a brownfield project, you can give it some reference code and tell it about places to look for patterns and it will understand them.
Bolting AI onto existing products probably doesn't make sense. AI is going to produce an entirely new set of products with AI-first creation modalities.
You don't need AI in Photoshop / Gimp / Krita to manipulate images. You need a brand new AI-first creation tool that uses your mouse inputs like magic to create images. Image creation looks nothing like it did in the past.
You don't need Figma to design a webpage. You need an AI-first tool that creates the output - Lovable, V0, etc. are becoming that.
You don't need AI in your IDE. Your IDE needs to be built around AI. And perhaps eventually even programming languages and libraries themselves need AI annotations or ASTs.
You don't need AI in Docs / Gmail / Sheets. You're going to be creating documents from scratch (maybe pasting things in). "My presentation has these ideas, figures, and facts" is much different than creating and editing the structure from scratch.
There is so much new stuff to build, and the old tools are all going to die.
I'd be shocked if anyone is using Gimp, Blender, Photoshop, Premiere, PowerPoint, etc. in ten years. These are all going to be reinvented. The only way these products themselves survive is if they undergo tectonic shifts in development and an eventual complete rewrite.
That's a long time for Adobe not to have figured out what your are saying.
A faster GPT 4o will kill Photoshop for good.
Even the latest model from this week, which is undeniably impressive, can’t get close to the level of control that photoshop gives me. It often edits parts of the image I haven’t asked it to touch among other issues. I use photoshop as a former photojournalist, and AI manipulated images are of no use to me. My photos are documentary. They represent a slice of reality. I know that AI can create a realistic simulacrum of that, but I’m not interested.
This is like saying we won’t need text editors in the future. That’s silly, there are some things that we won’t need text editors for, but the ability of ai to generate and edit text files doesn’t mean that we won’t ever need to edit them manually.
I'm really eager to see how this pans out in a decade.
Well, guilty, I actually do occasionally develop my own film.
Film photography is actually expanding as an industry right now. We are well past the point where digital photography can do everything a film camera can do, and in most cases it can do it far better (very minor exceptions like large format photography still exist, where you can argue that film still has the edge).
I think that whether you embrace AI photo editing or not has more to do with the purpose of your photos. If you are trying to create marketing collateral for a valentines day ad campaign, AI is probably going to be the best tool. If you are trying to document reality, even for aesthetic purposes, AI isn't great. When I make a portrait of my wife, I don't need AI to reinterpret her face for me.
I start every piece of work, green or brown, with a markdown file that often contains my plan, task breakdown, data models (including key fields), API / function details, and sample responses.
For the tool part, though, I took a slightly different approach. I decided to use Rust primarily for all my projects, as the compile-time checks are a great way to ensure the correctness of the generated code. I have noticed many more errors are detected in AI-generated Rust code than in any other language. I am happy about it because these are errors that I would have missed in other languages.
Is that because the Rust compiler is just a very strong guardrail? Sounds like it could work well for Swift too. If only xcodebuild were less of a pain for big projects.
Regarding swift, totally hear you :) Also I haven’t tried generating swift code - wondering how well that would be trained as there are fewer open source codebases for that.
If it’s high surprise then there’s a greater chance that you can’t tell right code from wrong code. I try to reframe this in a more positive light by calling it “exploration”, where you can ask follow up questions and hopefully learn about a subject you started knowing little about. But it’s important for you to realize which mode you are in, whether you are in familiar or unfamiliar waters.
https://royalicing.com/2025/infinite-bicycles-for-the-mind
The other benefit an experienced developer can bring is using test-driven development to guide and constrain the generated code. It’s like a contract that must be fulfilled, and TDD lets you switch between using an LLM or hand crafting code depending on how you feel or the AI’s competency at the task. If you have a workflow of writing a test beforehand it helps with either path.
I agree with the author here, but my worry is that by leaning on the LLMs, the very experience that allows me to uniquely leverage the LLMs now will start to atrophy and in a few years time I'll be relying on them just to keep up.
Senior developers have the experience to think through and plan out a new application for an AI to write. Unfortunately a lot of us are bogged down by working our day jobs, but we need to dedicate time to create our own apps with AI.
Building a personal brand is never more important, so I envision a future where dev's have a personal website with thumbnail links (like a fancy youtube thumbnail) to all the small apps they have built. Dozens of them, maybe hundreds, all with beautiful or modern UIs. The prompt they used can be the new form of blog articles. At least that's what I plan to do.
the low-hanging fruit is to create content/apps to help developers create their personal brands through content/apps.
1. Is the company providing the model willing to indemnify _your_ company when using code generation? I know GitHub Copilot will do this with the models they provide on their hardware, but if you’re using Claude Code or Cursor with random models do they provide equal guarantees? If not I wonder if it’s only a matter of time before that landmine explodes.
2. In the US, AFAICT, software that is mostly generated by non-humans is not copyrightable. This is not an issue if you’re creating code snippets from an LLM, but if you’re generating an entire project this way then none or only small parts of the code base you generate would then be copyrightable. Do you still own the IP if it’s not copyrightable? What if someone exfiltrates your software? Do you have no or little remedy?