Not only did they produce about the same amount of code in a day that they used to produce in a week (or two), several other things made my work harder than before:
- During review, they hadn't thought as deeply about their code so my comments seemed to often go over their heads. Instead of a discussion I'd get something like "good catch, I'll fix that" (also reminiscent of an LLM).
- The time spent on trivial issues went down a lot, almost zero, the remaining issues were much more subtle and time-consuming to find and describe.
- Many bugs were of a new kind (to me), the code would look like it does the right thing but actually not work at all, or just be much more broken than code with that level of "polish" would normally be. This breakdown of pattern-matching compared to "organic" code made the overhead much higher. Spending decades reviewing code and answering Stack Overflow questions often makes it possible to pinpoint not just a bug but how the author got there in the first place and how to help them avoid similar things in the future.
- A simple, but bad (inefficient, wrong, illegal, ugly, ...) solution is a nice thing to discuss, but the LLM-assisted junior dev often cooks up something much more complex, which can be bad in many ways at once. The culture of slowly growing a PR from a little bit broken, thinking about design and other considerations, until its high quality and ready for a final review doesn't work the same way.
- Instead of fixing the things in the original PR, I'd often get a completely different approach as the response to my first review. Again, often broken in new and subtle ways.
This lead to a kind of effort inversion, where senior devs spent much more time on these PRs than the junior authors themselves. The junior dev would feel (I assume) much more productive and competent, but the response to their work would eventually lack most of the usual enthusiasm or encouragement from senior devs.
How do people work with these issues? One thing that worked well for me initially was to always require a lot of (passing) tests but eventually these tests would suffer from many of the same problems
This reminded me of a quarter million dollar software project one of my employers had contracted to a team in a different country. On the face of it - especially if you go and check by the spec sheet - everything was there but the thing was not a cohesive whole. They did not spend one second beyond the spec sheet and none of the common sense things that "follow" from the spec were there. The whole thing was scrapped immediately.
With LLMs this kind of work now basically becomes free to do and automatic.
Good experienced devs will be able to make better software, but so many inexperienced devs will be regurgitating so much more lousy software at a pace never seen before, it's going to be overwhelming. Or as the original commenter described, they're already being overwhelmed.
I lowkey disagree. I think good experienced devs will be pressured to write worse software or be bottlenecked by having to deal with bad software. Depends on company and culture of course. But consider that you as expereinced dev now have to explain things that go completely over the head of the junior devs, and most likely the manager/PO, so you become the bottleneck, and all pressure will come down on you. You will hear all kinds of stuff like "80% there is enough" and "dont let perfect be the enemy of good" and "youre blocking the team, we have a deadline" and that will become even worse. Unless you're lucky enough to work in a place with actually good engineering culture.
I love that thread because it clearly shows both the benefits and pitfalls of AI codegen. It saved this expert a ton of time, but the AI also created a bunch of "game over" bugs that a more junior engineer probably would have checked in without a second thought.
Even looking strictly at coding, the hard thing about programming is not writing the code. It is understanding the problem and figuring out an elegant and correct solution, and LLM can't replace that process. They can help with ideas though.
Not really. This "review" was stretching to find things to criticize in the code, and exaggerated the issues he found. I responded to some of it: https://news.ycombinator.com/item?id=44217254
Unfortunately I think a lot of people commenting on this topic come in with a conclusion they want to reach. It's hard to find people who are objectively looking at the evidence and drawing conclusions with an open mind.
Like his first argument was that you didn't have a test case covering every single MUST and MUST NOT in the spec?? I would like to introduce him to the real world - but more to the point, there was nothing in his comments that specifically dinged the AI, and it was just a couple pages of unwarranted shade that was mostly opinion with 0 actual examples of "this part is broken".
> Unfortunately I think a lot of people commenting on this topic come in with a conclusion they want to reach. It's hard to find people who are objectively looking at the evidence and drawing conclusions with an open mind.
Couldn't agree more, which is why I really appreciated the fact that you went to the trouble to document all of the prompts and make them publicly available.
I won't say that you have converted me, but maybe I'll give LLMs a shot and judge for myself if they can be useful to me. Thanks, and good luck!
https://github.com/cloudflare/workers-oauth-provider/securit...
You can certainly make the argument that this demonstrates risks of AI.
But I kind of feel like the same bug could very easily have been made by a human coder too, and this is why we have code reviews and security reviews. This exact bug was actually on my list of things to check for in review, I even feel like I remember checking for it, and yet, evidently, I did not, which is pretty embarrassing for me.
The promise then was similar: "non-programmers" could use a drag-and-drop, WYSIWYG editor to build applications. And, IMO, VB was actually a good product. The problem is that it attracted "developers" who were poor/inexperienced, and so VB apps developed a reputation for being incredibly janky and bad quality.
The same thing is basically happening with AI now, except it's not constrained to a single platform, but instead it's infecting the entire software ecosystem.
Greed (wanting an enterprise alternative to Java and C++ builder) killed VB, not the community.
Yes there were a lot of crappy barely functioning programs made in it. But they were programs that wouldn’t have existed otherwise. Eg. For small businesses automating things vb was amazing and even if the program was barely functional it was better than nothing.
Large companies can be a red tape nightmare for getting anything built. The process overload will kill simple non-strategic initiatives. I can understand and appreciate less technical people who grab whatever tool they can to solve their own problems when they run into blockers like that. Even if they don't solve it in the best way possible according to experts in the field. That feels like the hacker spirit to me.
You’d be surprised how little effort it is compared to having to deal a massive outage. E.g. You did eventually had to think about backup power.
I think we will need to find a way to communicate “this code is the result of serious engineering work and all tradeoffs have been thought about extensively” and “this code has been vibecoded and no one really cares”. Both sides of that spectrum have their place and absolutely will exist. But it’s dangerous to confuse the two
Wrote it initially as a joke, but maybe it's not that dumb? I already do it on LinkedIn. I'm job hunting and post slop from time to time to game LinkedIn algorithms to get better positioning among other potential candidates. And not to waste anybody's time, I leave in the emotes at beginning of sentences just so people in the know know it's just slop (so as not to waste their time).
Why do you believe we should "turn our back on AI"? Have you used it enough to realize what a useful tool it can be?
Wouldn't it make more sense to learn to turn our backs on unhelpful uses of AI?
Take your photos example. Sure, the number of photos taken has exploded, but who cares if there are now reams and reams of crappy vacation photos - it's not like anyone is really forced to look at it.
With AI-generated code, I think it's actually awesome for small, individual projects. And in capable hands, they can be a fantastic productivity enhancer in the enterprise. But my heart bleeds for the poor sap who is going to eventually have to debug and clean up the mountains of AI code being checked in by folks with a few months/years of experience.
I have found time and again that enough technological advancement will make previously difficult things easy that when it's time to clean up the old stuff, it's not such a huge issue. Especially so if you do not need to keep a history of everything and can start fresh. This probably would not fly in a huge corp but it's fine for small/medium businesses. After all, whole companies disappear and somehow we live on.
Also if you're organizationally changing the culture to force people to put more effort in writing the code, why are you even organizationally using LLMs...?
Simply hire people who score high on the Conscientiousness, but low on the Agreeableness personality trait. :-)
Yeah, OK, I guess you have to be a bit less unapologetic than Linux kernel maintainers in this case, but you can still shift the culture towards more careful PRs I think.
> why are you even organizationally using LLMs
Many people believe LLMs make coders more productive, and given the rapid progress of gen AI it's probably not wise to just dismiss this view. But there need to be guardrails to ensure the productivity is real and not just creating liability. We could live with weaker guardrails if we can trust that the code was in a trusted colleague's head before appearing in the repo. But if we can't, I guess stronger guardrails are the only way, aren't they?
But when I actually sit down and think it through, I’ve wasted multiple days chasing down subtle bugs that I never would have introduced myself. It could very well be that there’s no productivity gain for me at all. I wouldn’t be at all surprised if the numbers showed that was the case.
But let’s say I am actually getting 20%. If this technology dramatically increases the output of juniors and mid level technical tornadoes that’s going to easily erase that 20% gain.
I’ve seen codebases that were dominated my mid level technical tornadoes and juniors, no amount of guardrails could ever fix them.
Until we are at the point where no human has to interact with code (and I’m skeptical we will ever get there short of AGI) we need automated objective guardrails for “this code is readable and maintainable”, and I’m 99.999% certain that is just impossible.
Usually organizational changes are massive efforts. But I guess hype is a hell of an inertia buster.
I imagine if you have a say in their performance review, you might be able to set "writes code more thoughtfully" as a PIP?
I will have my word in the matter before all is said and done. While everyone is busy pivoting to AI I keep my head down and build the tools that will be needed to clean up the mess...
I'm building a universal DOM for code so that we should see an explosion in code whose purpose is to help clean up other code.
If you want to write code that makes changes to a tree of HTML nodes, you can pretty much write that code once and it will run in any web browser.
If you want to write code that makes a new program by changing a tree of syntax nodes, there are an incredible number of different and wholly incompatible environments for that code to run in. Transform authors are likely forced to pick one or two engines to support, and anyone who needs to run a lot of codemods will probably need to install 5-10 different execution engines.
Most people seem not to notice or care about this situation or realize that their tools are vastly underserving their potential just because we can't come up with the basic standards necessary to enable universal execution of codemod code, which also means there are drastically lower incentives to write custom codemods and lint rules than there could/should be
As two nits, https://docs.bablr.org/reference/cstml and https://bablr.org/languages/universe/ruby are both 404, but I suspect that latter one is just falling into the same trap as many namespaces make of using a URL when they meant it as a URN
The JSX noise is CSTML, a data format for encoding/storing parse trees. It's our main product. E.g. a simple document might look something like `<*BooleanLiteral> 'true' </>`. It's both the concrete syntax and the semantic metadata offered as a single data stream.
The easiest way to consume a CSTML document is to print the code stored in it, e.g. `printSource(parseCSTML(document))`, which would get you `true` for my example doc. Since we store all the concrete syntax printing the tree is guaranteed to get you the exact same input program the parser saw. This means you can use this to rearrange trees of source code and then print them over the original, allowing you to implement linters, pretty-printers, or codemod engines.
These CSTML documents also contain all the information necessary to do rich presentation of the code document stored within (syntax highlighting). I'm going to release our native syntax highlighter later today hopefully!
I think we already are. We're about to be drowning in a cesspit. The support for the broken software is going to be replaced by broken LLM agents.
That's my expectation as well.
The logical outcome of this is that the general public will eventually get fed up, and there will be an industry-wide crash, just like in 1983 and 2000. I suppose this is a requirement for any overly hyped technology to reach the Plateau of Productivity.
No, they won't. It's a race to the bottom.
I can take extra time to produce something that won't fall over on the first feature addition, that won't need to be rewritten with a new approach when the models get upgraded/changed/whatever and will reliably work for years with careful addition of new code.
I will get underbid by a viber who produced a turd in an afternoon, and has already spent the money from the project before the end of the week.
Even better if the accountants are using LLMs.
Or even better, hardware prototyping using LLMs with EEs barely knowing what they are doing.
So far, most software dumbassery with LLMs can at least be fixed. Fixing board layouts, or chip designs, not as easy.
Folks, we already have bad software. Everywhere.
And nobody cares.
https://gs.statcounter.com/os-market-share/desktop/worldwide...
If you want to sell high quality software, then you must be patient. Several decades worth of patient.
The SOW was so poorly specified that it was easy to maliciously comply with it, and it had no real acceptance tests. As a result legal didn't think IT would have a leg to stand on arguing with the vendor on the contract, and we ended up constantly re-negotiating on cost for them to make fixes just to get a codebase that never went live.
An example of how bad it was - imagine you have a database of metadata to generate downloader tasks in a tool like airflow, but instead of doing any sane groupings of say the 100 sources with 1000 files each every day into a 100ish tasks, it generated a 700,000 task graph because its gone task-per-file-per-day.
We were using some sort of SaaS dag/scheduler tool at the time and if we deployed we'd have been using 5x more tasks than the entire decades-old, 200 person person were using to date, and paid for it.
Or they implemented the file arrival SLA checker such that it only alerted when a late file arrived. So if a file never arrives it never alerts. Or when a daily file arrives a week late, you get the alert on arrival, not a week ago when it was late.
To be fair though, in your case it aounds like 51% (and maybe even 75+%) of the defect was in the specifications.
You can have a loose spec and trust the team to do the right thing if it's an internal team you will allocate budget/time to iterate. Not if you have a fixed time & cost contract.
80% of our job is helping clients to figure out what do they actually need and what's reasonable to implement given current state of tech, finding that balance between ideal and realistic software, or rather negotiating it.
So expecting client to write SOWs/specifications is like expecting client to write code.
Aha, actually, I've recently seen it quite few times: people send me detailed SOW which look good, but once I try to read them to actually create an understanding of the domain logic/program in my head — it does not make any sense.
Very close to the grand-grand-parent comment about mentoring junior programmers. Now imagine they are the one paying you!
It also cuts against all trends of iterative development in that it is like waterfall with a gun to your head to get the spec 1000% right.
That's how lots of the early outsourced projects ended up. Perfectly matching the spec and not working.
> The whole thing was scrapped immediately.
And that's how it ended up too. everything old is new again.
Was it not possible to sees the quality issues before the project was finished?
The contractors simply wanted to get paid, naturally. The people who paid them didn't understand the original codebase, and they did not communicate with the people who designed and built the original codebase either. The people who built the original code were overworked and saw the whole bruhaha as a burden over which they had no control.
It was a low seven figure contract. The feature was scrapped after two or three years while the original product lived on and evolved for many years after that.
I hope that management learned their lesson, but I doubt it.
Eventually you will find yourself on deep waters, with the ship lower than it should be, routinely taking out buckets of water, whishing for the nearest island, only to repair ship with whatever is on that island, and keep sailing to the nearest one, with the buckets ready.
After a couple of enterprise projects, one learns it is either move into another business, or learn to cope with this approach.
Which might be specially trick given the job landscape on someone's region.
Before SE I had a bunch of vastly different jobs and they all suffered from something akin to crab bucket mentality where doing a good job was something you got away with.
I've had jobs where doing the right thing was something you kept to yourself or suffer for it.
I wish I could make $$$ off this insight somehow but im not sure it's possible.
Source: I've been replaced by this process a number of times.
I don't see how this would be causally linked to capitalism in any meaningful way.
I suspect that the majority of the people who claim that these tools are making them more productive are simply skipping these tasks altogether, or they never cared to do them in the first place. Then the burden for maintaining code quality is on the few who actually care, which has now grown much larger because of the amount of code that's thrown at them. Unfortunately, these people are often seen as pedants and sticklers who block PRs for no good reason. That sometimes does happen, but most of the time, these are the folks who actually care about the product shipped to users.
I don't have a suggestion for improving this, but rather a grim outlook that it's only going to get worse. The industry will continue to be flooded by software developers trained on LLM use exclusively, and the companies who build these tools will keep promoting the same marketing BS because it builds hype, and by extension, their valuation.
I think that's probably true, but I think there are multiple layers here.
There's what's commonly called vibe coding, where you don't even look at the code.
Then there's what I'd call augmented coding, where you generate a good chunk of the code, but still refactor and generally try to understand it.
And then there's understanding every line of it. For this, in particular, I don't believe LLMs speed things up. You can get the LLM to _explain_ every line to you, but what I mean is to look at documentation and specs to build your understanding and test out fine grained changes to confirm it. This is something you naturally do while writing code, and unless you type comically slow, I'm not convinced it's not faster this way around. There's a very tight feedback loop when you are writing and testing code atomically. In my experience, this prevents an unreasonable amount of emergencies and makes debugging orders of magnitude faster.
I'd say the bulk of my work is either in the second or the third bucket, depending on whether it's production code, the risks involved etc.
These categories have existed before LLMs. Maybe the first two are cheaper now, but I've seen a lot of code bases that fall into them - copy pasting from examples and SO. That is, ultimately, what LLMs speed up. And I think it's OK for some software to fall into these categories. Maybe we'll see too much fall into them for a while. I think eventually, the incredibly long feedback cycles of business decisions will bite and correct this. If our industry really flies off the handle, we tend to have a nice software crisis and sort it out.
I'm optimistic that, whatever we land on eventually, generative AI will have reasonable applications in software development. I personally already see some.
These devs don't get any value whatsoever from LLM, because explaining it to the LLM takes longer then doing it themselves.
Personally, I feel like everything besides actually vibe coding + maybe sanity checking via a quick glance is a bad LLM application at this point in time.
Youre just inviting tech dept if you actually expect this code to be manually adjusted at a later phase. Normally, code tells a story. You should be able to understand the thought process of the developer while reading it - and if you can't, there is an issue. This pattern doesn't hold up for generated code, even if it works. If an issue pops up later, you'll just be scratching your head what this was meant to do.
And just to be clear: I don't think vibe coding is ready for current enterprise environments either - though I strongly suspect it's going to decimate our industry once tooling and development practices for this have been pioneered. The current models are already insanely good at coding if provided the correct context and prompt.
E.g. countless docs on each method defining use cases, force the LLM to backtrack through the code paths before changes to automatically determine regressions etc. Current vibe coding is basically like the original definition of a hacker: a person creating furniture with an Axe. It basically works, kinda.
I feel like people are maybe underestimating the value of LLMs for some tasks. There's a lot of stuff where, I know how to do it but I can't remember the parameter order or the exact method name and the LLM absolutely knows. And I really get nothing out of trying to remember/look up the exact way to do something. Even when I do know, it often doesn't hurt to be like "can you give me a loop to replace all the occurrences of foo with bar in this array of strings" and I don't need to remember if it's string.replace(foo,bar), whether I need to use double or single quotes, if it's actually sub or gsub or whatever.
There's lots of tiny sub-problems that are totally inconsequential and an LLM can do for me, and I don't think I lose anything here. In fact maybe I take a little longer, I chat with the LLM about idioms a bit and my code ends up more idiomatic/more maintainable.
It kind of calls to mind something Steve Jobs said about how hotkeys are actually worse than using a mouse, and that keyboard users aren't faster, they just think they are. But using LLMs for these sorts of things feels similar in that, like using keyboard shortcuts, maybe it takes longer, but I can use muscle memory so I don't have to break flow, and I can focus on something else.
Asking the LLM for these sorts of trivial problems means I don't have to break flow, I can stay focused on the high-level problem.
I mean, I kinda get it in more complicated contexts, but the particular examples you describe (not remembering method names and/or parameter orderings) have been solved for ages by any decent IDE.
What boggles my mind is people are writing code that’s the foundation of products like that.
Maybe it’s imposter syndrome though to think it wasn’t already being done before the rise of LLMs
LLM “vibe coding” is another continuation of this “new hotness”, and while the more seasoned developers may have learned to avoid it, that’s not the majority view.
CEOs and C-suites have always been disconnected from the first order effects of their cost-cutting edicts, and vibe coding is no different in that regard. They see the ten dollars an hour they spend on LLMs as a bargain if they can hire a $30 an hour junior programmer instead of a $150 an hour senior programmer.
They will continue to pursue cost-cutting, and the advent of vibe coding matches exactly what they care about: software produced for a fraction of the cost.
Our problem — or the problem of the professionals - is that we have not been successful in translating the inherent problems with the CEOs approach to a change in how the C-suite operates. We have not successfully pursuaded them that higher quality software = more sales, or lower liability, or lower cost maintenance, and that partially because we as an industry have eschewed those for “move fast and break things”. Vibe coding is “Move Fast and Break Things” writ large.
This depends a lot on the "programming culture" from which the respective developers come. For example, in the department where I work (in some conservative industry) it would rather be a tough sell to use a new, shiny framework because the existing ("boring") technologies that we use are a good fit for the work that needs to be done and the knowledge that exists in the team.
I rather have a feeling that in particular the culture around web development (both client- and server-side parts) is very prone to this phenomenon.
In the Venn diagram of the programming culture of the companies that embrace vibe coding and the companies whose developers like to rewrite applications when a new framework comes out is almost a perfect circle, however.
Can the business afford to ship something that fails for 5% of their users? Can they afford to find out before they ship it or only after? What risks do they want to take? All business decisions. In my CTO jobs and fractional CTO work, I always focused on exposing these to the CEO. Never a "no", always a "here's what I think our options and their risks and consequences are".
If sound business decisions lead to vibe coding, then there's nothing wrong with it. It's not wrong to loose a bet where you understood the odds.
And don't worry about businesses that make uniformed bets. They can get lucky, but by and large, they will not survive against those making better informed bets. Law of averages. Just takes a while.
Sure, technical decisions ultimately depend on a cost-benefit analysis, but the companies who follow this mentality will cut corners at every opportunity, build poor quality products, and defraud their customers. The unfortunate reality is that in the startup culture "move fast and break things" is the accepted motto. Companies can be quickly started on empty promises to attract investors, they can coast for months or years on hype and broken products, and when the company fails, they can rebrand or pivot, and do it all over again.
So making uninformed bets can still be profitable. This law of averages you mention just doesn't matter. There will always be those looking to turn a quick buck, and those who are in it for the long haul, and actually care about their product and customers. LLMs are more appealing to the former group. It's up to each software developer to choose the companies they wish to support and be associated with.
It’s rare that startups gain traction because they have the highest quality product and not because they have the best ability to package, position, and market it while scaling all other things needed to mane a company.
They might get acqui-hired for that reason, but rarely do they stand the test of time. And when they do, it almost always because founders stepped aside and let suits run all or most of the show.
And yes, there is enshittification, there is immoral actors. The market doesn't solve these problems, if anything, it causes them.
What can solve them? I have only two ideas:
1. Regulation. To a large degree this stops some of the worst behaviour of companies, but the reality in most countries I can think of is that it's too slow, and too corrupt (not necessarily by accepting bribes, also by wanting to be "an AI hub" or stuff like that) to be truly effective.
2. Professional ethics. This appears to work reasonably well in medicine and some other fields, but I have little hope our field is going to make strides here any time soon. People who have professional ethics either learn to turn it off selectively, or burn out. If you're a shady company, as long as you have money, you will find competent developers. If you're not a shady company, you're playing with a handicap.
It's not all so black and white for sure, so I agree with you that there's _some_ power in choosing who to work for. They'll always find talent if they pay enough, but no need to make it all too easy for them.
It may well have been happening before the rise of LLMs, but the volume was a lot more manageable
Now it's an unrestricted firehose of crap that there just not enough good devs to wrangle
The volume here is orders of magnitude greater, but that’s the closest example I can think of.
Tech exec here. It is all about gamed metrics. If the board-observed metric is mean salary per tech employee, you'll get masses of people hired in india. In our case, we hire thousands in India. Only about 20% are productive, but % productive isnt the metric, so no one cares. You throw bodies at the problem and hope someone solves it. Its great for generations of overseas workers, many of whom may not have had a job otherwise. You probably have dozens of Soham Parekhs .
Western execs also like this because it inflates headcount, which is usually what exec comp is based on "i run a team of 150.." Their lieutenants also like it because they can say "i run a team of 30", as do their sub-lieutenants "i run a team of 6"
I think this follows a larger pattern of AI. It helps someone with enough maturity to not rely on it too blindly and enough foresight to know they still need to grow their own skills, but does well enough that those looking for an easy or quick answer is now given that tool that lets them skip doing more of the hard work. It empowers seniors (developer or senior level in unrelated fields) but traps juniors. Same as using AI to solve a math problem. Is the student verifying their own solution against the AI's, or copying and pasting while thinking they are learning by doing so (or even recognizing their aren't but not worrying about it since the AI can handle it and not realizing how this will trap them on ever harder problems in the future).
>...but rather a grim outlook that it's only going to get worse. The industry will continue to be flooded by software developers...
I somewhat agree, but even more grim, I think we are looking at this across many more fields than just software development. The way companies make use of this and the market forces at the corporate level might be different, but it is also impacting education and that alone should be enough to negatively impact other areas.
AI creates the same problem for hiring too: it generates the appearance of knowledge. The problem you and I have as evaluators of that knowledge is there is no other interface to knowledge than language. In a way this is like the oldest philosophy problem in existence. Socrates spent an inordinate amount of time railing against the sophists, people concerned with language and argument rather than truth. We have his same problem, only now on an industrial scale.
To your point about tests, I think the answer is to not focus on automated tests at first (though of course you should have those eventually), but instead we should ask people to actually run the code while they explain it to show it working. That's a much better test: show me how it works, and explain it to me.
But software development is about producing written artifacts. We actually need the result. We care a lot less about whether or not the developer has a particular understanding of the world. A cursor-written implementation of a login form is of use to a senior engineer because she actually wants a login form.
1. The invention of THE CONCEPT BEHIND THE MACHINE. In our context, this is "Programming as Theory Building." Our programs represent some conception of the world that is NOT identical to the source code, much the way early precision tools embodied philosophies like interchangeability.
2. The building of the machine itself, which has to function correctly. To your point, this is one of the major things we care about, but I don't agree it's the only thing. In the code world this IS the code, to your point. When this is all we think about, though, I think you get spaghetti code bases and poorly trained developers.
3. Training apprentices in both the ideas and the craft of producing machines.
You can argue we should only care about #2, many businesses certainly incentivize thinking in that direction, but I think all 3 are important. Part of what makes coding and talking about coding tricky is that written artifacts, even the same written artifacts, express all 3 of these things and so matters get very easily confused.
We actually should because the developer has to maintain and extend the damned thing in the future
There’s a reason no one does it. Because it’s inefficient. Even in recorded video format. The helpful things are tests and descriptives PRs. The former because its structure is simple enough that you can judge it, and the test run can be part of the commit. The second is for the simple fact that if you can write clearly about your solution, I can the just do a diff of what you told me and what the code is doing, which is way faster than me trying to divine both from the code.
I claim that this approach is sustainable.
The idea behind the "I read all of your code and give feedback." methodology is that the writer really put a lot of deep effort into making sure that the code is of great quality - and then he is expecting feedback, which is often valuable. As long as you can with some effort find out by yourself how improvements could be done, don't bother asking for someone else's time/
The problem is thus that the writers of "vibe-generated code" hardly ever put such a deep effort into the code. Thus the code is simply not worth asking feedback for.
Leetcode Zoom calls always were marginal, now with chat AI they're virtually useless though still the norm.
It's funny, I have the same problem, but with subject matter expertise. I work with internal PR people and they clearly have shifted their writing efforts to be AI-assisted or even AI-driven. Now I as the SME get these AI-written blog posts and press releases and I spend a far more time on getting all the hallucinations out of these texts.
It's an effort inversion, too - time spent correcting the PR-people's errors has tripled or quadrupled. They're supposed to assist me, not the other way around. I'm not the press release writer here.
And of course they don't 'learn' like your junior engineers - it's always AI, it's always different hallucinations.
P.S.: And yes I've raised this internally with our leadership - at this rate we'll have 50% of the PR people next year, they're making themselves unemployed. I don't need a middleman who's job it is to copy-paste my email into ChatGPT, then send me the output; I can do that myself.
Of course this is impossible to enforce, and I believe that the PR people would rather hide their AI usage. (As I wrote above why pay high salaries to people who automate themselves away?)
So then you see where this is going.
Edit: actually, that's the story of my life. I've been working for 20 years and every 5 years or so, stuff gets reshuffled so I have 3 more jobs instead of 1. It feels like I have 20 jobs by now, but still the same salary. And yes I've switched employers and even industries. I guess the key is to survive at the end of the funneling.
I think we've always had this mental model which needs to change that senior engineers and product managers scope and design features, IC developers (including juniors for simpler work) implement them, and then senior engineers participate in code review.
Right now I can't see the value in having a junior engineer on the team who is unable to think about how certain features should be designed. The junior engineer who previously spent his time spinning tires trying to understand the codebase and all the new technologies he has to get to grips with should instead spend that time trying to figure out how that feature fits into the big picture, consider edge cases, and then propose a design for the feature.
There are many junior engineers who I wouldn't trust with that kind of work, and honestly I don't think they are employable right now.
In the short term, I think you just need to communicate this additional duty of care to make sure that your pull requests are complete because otherwise there's an asymmetry of workload and judge those interns and juniors on how respectful of that they are.
The issue with LLM tools is that they don't teach this. The focus is always on getting to the end result as quickly as possible, skipping any of the actually important parts of software development. The way problem solving is approached with LLMs is by feeding them back to the LLM, not by solving them yourself. This is another related issue: relying on an LLM doesn't give you software development experience. That is gained by actually solving problems yourself; understanding how the system works, finding the underlying root cause, fixing it in an elegant way that doesn't create regressions, writing robust tests to ensure it doesn't happen again, etc. This is the learning experience. LLMs can help with this, but they're often not used in this way.
Well that sucks because that just means the pipeline for engineers to become seniors is completely broken
I have no interest in pulling the ladder up behind me
- you need to think through the product more, really be sure it’s as clarified as it can be. Everyone has their own process, but it looks like rubber ducking, critiquing, breaking work into phases, those into tasks, etc. (jobs to be done, business requirement docs, domain driven design planning, UX writing product lexicon docs, literally any and all artifacts)
- Prioritize setting up tooling and feedback loops (code quality tools of any and every kind, are required). this includes custom rules to help enforce anything you decided during planning. Spent time on this and life will be a lot better for everyone.
- We typically making very very detailed plans, and then the agents will “IVI” it (eg automatic linting, single test, test suite, manual evaluation).
You basically set up as many and as diverse of automatic feedback signals as you can.
—-
I will plan and document for 2-4 hours, then print a bunch of small “PRDs” that are like “1 story point” small. There’s clear definitions of done.
Doing this, I can pretty much go the gym or have meetings or whatever for 1-2 hours hands off.
—-
A well-architected system is easier to develop and easier to maintain. It makes sense to put all the human effort into producing that because, lo and behold, both humans and LLMs can produce much better results within a well-defined structure.
Not sure what to tell you otherwise. The code is much more thought through, with more tests, and better docs. There’s even entire workflows for the CI portion and review.
I would look at workflows like this as augmentation than automation.
What this actually means is that your manager gets a raise when the AI written code works, and you get fired when it inevitably breaks horribly. You also get fired if you do not use AI written code
1. Mostly written by LLMs, and only superficially reviewed by humans.
2. Written 50-50% by devs and LLMs. Reviewed to the same degree as now.
Software of type 2 will be more expensive and probably of higher quality. Type 1 software will be much much more common, as it will be cheaper. Quality will be lower, but the open question is whether it will be good enough for the use cases of cheap mass produced software. This is the question that is still unanswered by practical experience, and it's the question that all the venture capitalists a salivating about.
You give up, approve the trash PRs, wait for it to blow up in production and let the company reap the rewards of their AI-augmented workforce, all while quietly looking for a different job or career altogether.
If LLMs will be able to write unit tests, this will get worse, because there will be no time spent reflecting about "what do I need" or "how can this be simplified". These are, in my opinion, how to characterize the differences between a Developer, Engineer, and Architect mindset. And LLMs / vibe coding will never develop actual engineers or architects, because they never can develop that mindset.
The easiest programming language to spot those architectural mistakes in is coincidentially the one with the least syntax burden. In Go it's pretty easy to discover these types of issues in reviews because you can check the integrated unit tests, which help a lot in narrowing down the complexities of code branches (and whether or not a branch was reached, for example).
In my opinion we need better testing/review methodologies. Fuzz testing, unit testing and integration testing isn't enough.
We need some kind of logical inference tests which can prove that code branches are kept and called, and allow to confirm satisfiabilities.
You might make this easier by saying you just checked their code with your own AI system and then say it returned "you obviously didn't write it, please redo".
That said, a lazy contribution - substandard code or poorly LLM generated - just wastes your time if your feedback is just put into the LLM again. Setting boundaries then is perfectly acceptable, but this isn't unique to LLMs.
My only hope is that AI one day will be much better than humans in every aspect and produce super high quality code. I don't see why this wouldn't happen. The current tools are still primitive.
I see this a lot and even done so myself, I think a lot of people in the industry are a bit too socially-aware and think if they start a discussion they look like they're trying too hard.
It's stupid yes, but plenty of times I've started discussions only to be brushed off or not even replied to, and I believed it was because my responses were too long and nobody actually cared.
But then, for me, writing is a way to organize thought as well, plus these remarks will stay in the thread for future reference. In theory anyway, in practice it's likely they'll switch from Gitlab to something else and all comments will be lost forever.
Which makes me wish for systems that archive review remarks into Git somehow. I'm sure they exist, but they're not commonly used.
Another thing I do is ask for the claude session log file. The inputs and thought they provided to claude give me a lot more insight than the output of claude. Quite often I am able to correct the thought process when I know how they are thinking. I've found junior developers treat claude like a sms - small ambiguous messages with very little context, hoping it would perform magic. By reviewing the claude session file, I try to fix this superficial prompting behaviour.
And third, I've realized claude works best of the code itself is structured well and has tests, tools to debug and documentation. So I spend more time on tooling so that claude can use these tools to investigate issues, write tests and iterate faster.
Still a far way to go, but this seems promising right now.
One, they need to run their code. Make sure it works before submitting a PR. If someone submits code to me that does not work I don't care if it came from an LLM or not, go run your code and come back when it works. If they routinely refuse to run their code and never learn their lesson then I might suggest they find another profession... Or require they submit a video of the code working.
Second, going away and coming back with a totally different PR I give the feedback of "what happened to the code we were working on before? We didn't need all new code." As the senior my time is worth (a bit) more than the intern's so I don't hesitate to make their bad choices their problem. Come back when you've made a serious attempt and then we can discuss it.
One thing I do that helps clean things up before I send a PR is writing a summary. You might consider encouraging your peers to do the same.
## What Changed?
Functional Changes:
- New service for importing data
- New async job for dealing with z.
Non-functional Changes: - Refactoring of Class X
- Removal of outdated code
It might not seem like much, but writing this summary forces you to read through all the changes and reflect. You often catch outdated comments, dead functions left after extractions, or other things that can be improved—before asking a colleague to review it.It also makes the reviewer’s life easier, because even before they look at the code, they already know what to expect.
PRs in general shouldn't require elaborate summaries. That's what commit messages are for. If the PR includes many commits where a summary might help, then that might be a sign that there should be multiple PRs.
Granted, it is not only summaries that go into the description—how to test, if there is any pre-deploy or post-deploy setup, any concerns, external documentation, etc.
Less is more. A summary serves to clarify, not to endlessly add useless information.
⸻
2. about the usefulness of summaries.
Summaries always provide better information—straight to the point—than commits (which are historical records). This applies to any type of information.
When you’re reporting a problem by going through historical facts, it can lead to multiple narratives, added complexity, and convoluted information.
Summaries that quickly deliver the key points clearly and focus only on what’s important offer a better way to communicate.
If the listener asks for details, they already have a clear idea of what to expect. A good summary is a good introduction to what you are going to see in the commits messages and in the code changes.
______________________
3.About multiple Prs.
Summary helps to clarify what is scope creep (be it a refactor or unrelated code to the ticket);
it make it easier for the reviewer demand a split in multiple PRs.
examples: A non-summary PR/MR might lead to the question—“WHY is this code here?"
"he touched a class here, was he fixing something that the test missed out ? or is just a refactor?"
_______________
As a reviewer you can get those information by yourself, although summary helps you to get it much quicker.
This is precisely what a (good) commit message should answer.
Commits are historical records, sure, but they can include metadata about the change, which should primarily explain why the change was made, what tradeoffs were made and why, and any other pertinent information.
This is useful not just during the code review process, but for posterity whenever someone needs to understand the codebase, while bisecting, etc. If this information is only in the PR, it won't be easy to reference later.
FWIW I'm not against short summaries in PRs that are exceptionally tricky to understand. The PR description is also useful as a living document for keeping track of pending tasks. But in the majority of cases, commit messages should suffice. This is why GitHub, and I'm sure other forges as well, automatically fill out the PR title and description with the commit information, as long as there's only one commit, which is the ideal scenario. For larger PRs, if it doesn't make sense to create multiple, I usually just say "See the commits for details".
But most of the time it is not very necessary.
My experience with LLM-generated summaries is the same as it was with the templates: many complete them in a way that is entirely self-referential and lacking in context. I don't need a comment or a summary to describe to me exactly the same thing I could have already understood by reading the code. The reason for adding English-language annotations to source code is to explain how a particular change solves a complex business problem, or how it fits into a long-term architectural plan, that sort of thing. But the kinds of people who already did not care about that high level stuff don't have the context to write useful summaries, and LLMs don't either.
The worst thing I've seen recently is when you push for more clarity and context on the reasons behind a change, and then that request gets piped into an LLM. The AI subsequently invents a business problem or architectural goal that in reality doesn't exist and then you get a summary that looks plausible in the abstract, and may even support the code changes it is describing, but it still doesn't link back to anything the team or company is actually trying to achieve, and that costs the reviewer even more time to check. AI proponents might say "well they should have fed the team OKRs and company mission/vision/values into the LLM for context" but then that defeats the point of having the code review in the first place. If the output is performative and not instructive, then the whole process is a waste of time.
I am not sure what the solution is, although I do think that this is not a problem that started with LLMs, it's just an evolution of a challenge we have always faced - how to deal with colleagues who are not really engaged.
It is likely not possible to completely forbid junior developers from using AI tools, but any pull request that they create that contains (AI-generated) code that they don't fully comprehend (they can google) will be rejected (to test this, simply ask them some non-trivial questions about the code). If they do so, again, these junior developers deserve a (small) tantrum.
So we can ask everyone using these tools to understand the code before submitting a PR, but that's the best we can do. There's no need to call anyone out for not meeting some invisible standard of quality.
What works for me is that after having lots of passing tests, I start refactoring the tests to get closer to property testing: basically prove that the code works by allowing it to go through complex scenarios and check that the state is good in every step instead of just testing lots of independent cases. The better the test is, the harder LLMs are able to cheat.
We scoff at clever code thats hard to understand leading to poor ability for teams to maintain, but what about knowingly much lower quality code?
Much like Ikea's low cost replaceable furniture has replaced artisan, hand made furniture and cheap plastic toys have replaced finely made artifacts. LLM produced code is cheap and low effort; meant to be discarded.
In recognizing this, then it should be used where you have this in mind. You might still buy a finely made sofa because it's high touch. But maybe the bookshelves from Ikea are fine.
So it's not so important.
My favorite LLM-generated code I've seen in PRs lately is
expect(true).toBe(true)
Look ma! Tests aren't flaky anymore!This kind of thing drove me mad even before LLMs or coding - it started at school when I helped people with homework. People would insist on switching to an entirely different approach midway through explaining how to fix the first one.
For junior devs, it’s about the same, I’m assigning hack jobs, because most of what we need to do are hack jobs. The code really isn’t the bottleneck in that case, the research needed to write the code is.
In other words, we need to code review the same way we interact with LLMs - point to the overarching flaw and request a reroll.
Would you mind drilling down into this a bit more? I might be dealing with a similar problem and would appreciate if you have any insight
I had to think a bit about it, but when it feels off it can be something like:
- I wrote several paragraphs explaining my reasoning, expecting some follow-up questions.
- The "fix" didn't really address my concerns, making it seem like they just said "okay" without really trying to understand. (The times when the whole PR is replaced makes it seem like my review was also just forwarded to the LLM, haha)
- I'm also comparing to how I often (especially earlier in my career) thought a lot about how to solve things, and when I got constructive feedback it felt pretty rewarding - and I could often give my own reasoning for why I did things a certain way. Sometimes I had tried a bunch of the things that the reviewer suggested, leading to a more lively back-and-forth. This could just be me, of course, or a cultural thing, but my expectation also comes from how other developers I've worked with react to my reviews.
Does that make sense? I'd be interested in hearing more about the problem you're dealing with. If this is not the right place, feel free to send an email :)
The doomer perspective would be that people are getting dumber and more complacent and that this will unravel society, but that might not actually be the case if we consider that the mindset already existed in other societies that still thrive. Perhaps the people who never really gave a crap about the quality of their work were right all along? After all, despite the fact most of us are in the top 20% of earners in our countries and easily the top 10% or an even more elite minority globally, end of the day we are still "code peasants" who build whatever our boss told us to build so that an ultra-wealthy investor class can compound their wealth. Why should we waste our time caring about that? Why not get an AI to grind out garbage on our behalf? Why not focus our energies on more rewarding pursuits in our personal lives?
Of course I am playing devil's advocate here, because for me personally being forced to show up for work every day thanks to capitalism and then doing a half-assed job makes me more depressed than trying to excel at something I never wanted to do in the first place. But there is a part of me that understands the mindset and wonders if my life might be easier if I shared it.
Anyway, prior to LLMs I dealt with this phenomenon by reluctantly accepting that most people don't care anywhere near as much about the quality of their work as I do, and that it was hopeless trying to change them. Find the few who do care and prioritize actually-productive knowledge exchanges with them. Drop your standards for people who clearly don't care. If the code doesn't meet your standards but it's still more-or-less functional, just let it go. You might imagine it'll reflect poorly on you, except in reality management doesn't care anyway - the push to AI all the things right now is the "mask off" moment. Every now and then you'll still find a motivated junior who really is passionate about getting better and then being a part of their growth is still rewarding.
The way that you solve this is that you pull your junior into a call and work them through your comments one by one verbally, expecting them to comprehend the issues every time.
The codebase will go to shit regardless.
I don't understand, if they don't test the code they write (even if manually) it's not an LLM issue, it's a process one.
They have not been taught what does it mean to have a PR ready for being reviewed, LLMs are irrelevant here.
You think about the implementation and how it can fail. If you don’t think about the implementation, or don’t understand the implementation, I would argue that you can earnestly try to test, but you won’t do a good job of it.
The issue of LLMs here is the proliferation of people not understanding the code they produce.
Having agents or LLMs review and understand and test code may be the future, but right now they’re quite bad at it, and that means that the parent comment is spot on; what I see right now is people producing AI content and pushing the burden of verification and understanding to other people.
Let's ignore the code quality or code understanding: these juniors are opening PRs, according to the previous user, that simply do not meet the acceptance criteria for some desired behavior of the system.
This is a process, not tools issue.
I too have AI-native juniors (they learned to code along copilot or cursor or chatgpt) and they would never ever dare opening a PR that doesn't work or does not meet the requirements. They may miss some edge case? Sure, so do I. That's acceptable.
If OP's are, they have not been taught that they have to ask for feedback when their version of the system does what it needs to.
Where was the burden prior to LLM's?
if a junior cannot prove his/her code as working and have an understanding, how was this "solved" before llm? Why can't the same methods work post-llm? Is it due to volume? If a junior produces _more_ code they don't understand, it doesn't give them the right to just skip PR/review and testing etc.
If they do, where's upper management's role here then? The senior should be bringing up this problem and work out a better process and get management buy-in.
>> If you don’t think about the implementation, or don’t understand the implementation, I would argue that you can earnestly try to test, but you won’t do a good job of it.
Previously the producers of the code were competent to test it independently.
This increasingly, to my personal observation, appears to no longer be the case.
They do test it, they just dont think about it deeply and so they do a shit job of testing it, and an incompetent job of writing tests for it.
Not by being lazy; smart diligent folk doing a bad job because they didn't actually understand what needed to be tested, and tested some irrelevant trivial happy path based on the requirements not the implementation.
Thats what LLMs give you.
Its not a process issue; its people earnestly thinking they've done a good job when they havent.
This is of course especially significant in codebases that do not have strict typing (or any typing at all).
Catching this is my job, but it becomes harder if the PR actually has passing tests and just "looks" good. I'm sure we'll develop the culture around LLMs to make sure to teach new developers how to think, but since I learned coding in a pre-LLM world, perhaps I take a lot of things for granted. I always want to understand what my code does, for example - that never seemed optional before - but now it seems to get you much further than just copy-pasting stuff from Stack Overflow ever did.
If you wanted software engineers to be able to hold any sort of quality line against a few trillion dollars worth of AI investment, we needed to unionize or even form a guild twenty years ago.
"""
This is the new adder feature. Internally it uses chained Adders to multiply:
Adder(Adder(Adder(x, y), y), ...)
"""
class Adder:
# public attributes x and y
def __init__(self, x: float, y: float) -> None:
raise NotImplementedError()
def add(self) -> float:
raise NotImplementedError()
class Muliplier:
# public attributes x and y
# should perform multiplication with repeated adders
def __init__(self, x: float, y: float) -> None:
raise NotImplementedError()
def multiply(self) -> float:
raise NotImplementedError()
This is a really dumb example (frankly something Claude would write), but it illustrates that they should do this for external interfaces and implementation details.For changes, you'd do the same thing. Specify it as comments and "high level" code ("# remove this class and switch to Multiplier") etc.
Then spec -> review -> tests -> review -> code -> review.
Depending on how much you trust a dev, you can kill some review steps.
1. It's harder to vibe good specs like this from the start, and prevents Claude from being magical (e.g. executing code to make sure things work)
2. You're embedding a design process into reviews which is useful even if they're coding by hand.
3. It simplifies reviewing generated code because at least the interfaces should be respected.
This is the pattern I've been using personally to wrangle ChatGPT and Claude's behavior into submission.
My gut feeling is that it would generalize to typed languages, Go, Erlang, even Haskell etc, but maybe some of them make life easier for the reviewer in some ways? What are your thoughts on that?
I didn't expect this initially but I am seeing it a ton at work now and it is infuriating. Some big change lands in my lap to review and it has a bunch of issues but they can ultimately be worked out. Then kaboom it is an entirely different change that I need to review from scratch. Usually the second review is just focused on the edits that fixed the comments from my first review. But now we have to start all over.
Here is the crazy part: As a nearly neckbeard, there were no code reviews or PRs in my era. And mostly zero unit tests.
It's a world away from when the industry began. There's a great story from Bill Gates about a time when his ability to simply write code was an incredibly scarce resource. A company was so desperate for programmers that they hired him and Paul Allen as teenagers:
"So, they were paying penalties... they said, 'We don’t care [that they are kids].' You know, so I go down there. You know, I’m like 16, but I look about 13. They hire us. They pay us. It’s a really amazing project... they got a kick out of how quickly I could write code."
That story is a powerful reminder of how much has changed. Writing code was the bottleneck years ago. However the core problem has shifted from "How do we build it?" to "What should we build and is there a business for it?"I think it's credible to say that it was just market demand. Marc Andreessen's main complaint before the AI boom was that "there is more capital available than there are good ideas to fund". Personally, I think that's out of touch with reality, but he's the guy with all the money and none of the ideas, so he's a credible fist-hand source.
Also, he's a VC, but where more funding even in pure software is needed are sustainable businesses that don't have ambition to take over the world, but rather serve their customer niche well.
There is immense, unmet demand for good software in developing countries—for example, robust applications that work well on underpowered phones and low-bandwidth networks across Africa or Southeast Asia. These are real problems waiting for well-executed ideas.
The issue isn't a lack of good ideas, but a VC ecosystem that throws capital at ideas of dubious utility for saturated markets, while overlooking tangible, global needs because they don't fit a specific hyper-growth model.
I do believe that these also fit the hyper-growth model. It's rather that these investors have a very US-centric knowledge of markets and market demands, and thus can simply barely judge ideas that target very different markets.
The capability to write high-quality code and have a deep knowledge about it is still a scarce resource.
The difference from former days is rather that the industry began to care less about this.
Back when these tools did not exist yet, a lot of this knowledge didn't exist yet. Software now is built on the shoulders of giants. You can write a line of code and get a window in your operating system, people like Bill Gates and his generation wrote the low level graphics code and had to come up with the concept of a window first, had to invent the fundamentals of graphics programming, had to wait and interact with hardware vendors to help make it performant.
I think we have a tendency to overestimate efficiency... because of the central roles it plays at the margins that mattered to us at any given time. .
But the economy is bottlenecked in complex ways. Market demand, money, etc.
It's not obvious that 100X more code is something we can use.
No it wasn't. It never was.
They weren't. They were hired because of their ability to deliver software products. Huge difference.
Every kid who mechanically copied BASIC games from a magazine could "write code", but they weren't Bill Gates.
(Anyways Bill Gates was hired because of nepotism, but that's irrelevant here.)
And, cynically, I bet a software LLM will be more responsive to your feedback than the over-educated and overpaid junior “engineer” will be. Actually I take it back, I don’t think this take is cynical at all.
I see it as a sign of how bad juniors are, and the need of seniors interacting with LLM directly without the middlemen.
In my experience I see juniors come out of college who can code in isolation as well as me or better. But the difference between jr/sr is much more about integration, accuracy and simplicity than raw code production. If LLMs remove a lot of the hassle of code production I think that will BENEFIT the other elements, since those things will be much more visible.
Personally, I think juniors are going to start emerging with more of a senior mindset. If you don't have to sweat uploading tons of programming errata to your brain you can produce more code abd more quickly need to focus on larger structural challenges. That's a good thing! Yes, they will break large codebases but they have been soing that forever, if given the chance. The difference now is they will start doing that much sooner.
A human’s ability to assess, interrogate, compare, research, and develop intuition are all skills that are entirely independent of the coding tool. Those skills are developed through project work, delivering meaningful stuff to someone who cares enough to use it and give feedback (eg customers), making things go whoosh in production, etc etc.
This is a XY problem and the real Y are galaxy brains submitting unvalidated and shoddy work that make good outcomes harder rather than easier to reach.
Jr Devs are responding to incentives to learn how to LLM, which we are saying all coders need to.
So now we have to torture the argument to create a carve out for junior devs - THEY need to learn critical thinking and taking responsibility.
Using an LLM directly reduces your understanding of whatever you used it write, so you can't have both - learning how to code, and making sure your skills are future proof.
There’s no carve out. Anyone pushing thoughtless junk in a PR for someone else to review is eschewing responsibility.
But I think that makes them invaluable in professional contexts. There is so much tooling we never have the time to write to improve stuff. Spend 1-2h with Claude code and you can have an admin dashboard, or some automation for something that was done manually before.
A coworker comes to me with a question about our DB content, Claude gives me a SQL query for what they need, review, copy paste to Metabase or Retool, they now don’t have to be blocked by engineering anymore. That type of things has been my motivation for mcp-front[0], I wanted my non-engs coworkers to be able to do that whole loop by themselves.
we spin up a data lake, load all your data and educate an agent on your data.
But getting it to spit out hundreds or even thousands of lines of code and then just happy path testing and shipping is insane.
I'm really concerned about software quality heading into the future.
Examples:
This morning Claude Code built a browser-based app that visualizes 3.8M lines of JSON dumps of AWS infrastructure. Attention required by me: 15 minutes. Results: Reasonable for a 1-shot.
A few weeks ago I had it build me a client/server app in multiple flavors from a well defined spec: async, threaded, and select, to see which one was the most clear and easy to maintain.
A few days ago I gave it a 2K line python CLI tool and said "Build me a web interface to this CLI program". It nearly one-shotted it (probably would have if I had the playwright MCP configured).
These are all things I never would have been able to pursue without the LLM tooling in the past because I just don't have time to write the code.
There are definitely cases where the code is not the bottleneck, but those aren't the only cases.
I can forget about the details and care more about architecture, how things connect, etc.
We have the technology to make the technology not suck. The real challenge is putting that developer ego into a box and digging into what drives the product's value from the customer's perspective. Yes - we know you can make the fancy javascript interaction work. But, does the customer give a single shit? Will they pay more money for this? Do we even need a web interface? Allowing developers to create cat toys to entertain themselves with is one realistic way to approach the daily cloud spend figures of Figma.
The biggest tragedy to me was learning that even an aggressive incentive model does not solve this problem. Throwing equity and gigantic salaries into the mix only seems to further complicate things. Doing software well requires at least one person who just wants to do it right regardless of specific compensation. Someone who is willing to be on all of the sales & support calls and otherwise make themselves a servant to the customer base.
Yup. The tough part of my job has always been taking the business requirements and then figuring out what the business ACTUALLY wants. Users will tell you what they want, but users are not designers and usually don't think past what they currently want right now. Give them exactly what they say they want and it will almost never give a good result. You have to navigate consequences of decisions and level-set to find the solution.
LLMs are not good at this and only seem to get worse as "improved" models find users prefer constant yes-manning. I've never had an LLM tell me my idea was flawed and that's a huge issue when writing software.
If programmers spend 90%+ of their time reading code rather than writing it, then LLM-generated code is optimizing only a small amount of the total work of programming. That seems to be similar to the point this blog is making.
[1] https://www.goodreads.com/quotes/835238-indeed-the-ratio-of-...
Context is never close at hand, it is scattered all over the place defeating the purpose.
Now I have produced a lot of programs, just by reading them.
People should also learn how to read programs. Most open source code is atrocious, corporate code is usually even worse, but not always.
As Donald Knuth once said, code is meant to be read. The time of literate programming is gonna come at some point, either in 100 years or in 3 years.
People used to resist reading machine generated output. Look at the code generator / source code / compiler, not at the machine code / tables / xml it produces.
That resistance hasn't gone anywhere. Noone wants to read 20k lines of generated C++ nonsense that gcc begrudgingly accepted, so they won't read it. Excitingly the code generator is no longer deterministic, and the 'source code prompt' isn't written down, so really what we've got is rapidly increasing piles of ascii-encoded-binaries accumulating in source control. Until we give up on git anyway.
It's a decently exciting time to be in software.
I've even had code submitted to me by juniors which didn't make any sense. When I ask them why they did that, they say they don't know, the LLM did it.
What this new trend is doing is generating a lot of noise and overhead on maintenance. The only way forward, if embracing LLMs, is to use LLMs also for the reviewing and maintenance, which obviously will lead to messy spaghetti, but you now have the tools to manage that.
But the important realization is that for most businesses, quality doesn't really matter. Throwaway LLM code is good enough, and when it isn't you can just add more LLM on top until it does what you think you need.
I can't imagine a professional software developer in a position of authority leaving that statement unchallenged and uncorrected.
If a person doesn't stand behind the code they write, they shouldn't be employed. Full stop.
Of course I didn't approve the PR.
It can be reviewed. The job descriptions I have encountered so far, (thousands of them), not one of them mentions reading fast as a skill even more important that writing/typing.
Put it another way, I have yet to go to an interview with the sole purpose of reading code, and writing code as an insignificant detail. For example: 5 years of reading Python/Django code as experience.
Anyway that's gonna change, reading code/documentation fast, not even reviewing just reading, is the skill of utmost importance to hire for!
Donald Knuth: “Programs are meant to be read by humans and only incidentally for computers to execute.” [1]
[1] https://www.goodreads.com/quotes/6086714-programs-are-meant-...
This should resolve itself via rounds of redundancies, probably targetting the senior engineers that are complaining about the juniors, then by insolvency.
As I get older I spend more of my coding time on walks, at the whiteboard, reading research, and running experiments
Reminds me of a former colleague of mine, I'd sit next to him and get frustrated because he was a two-finger typer. But, none of his code was wasted. I frequently write code, then cmd+z back to ten minutes ago or just `git checkout .` because I lost track.
Authoring has never been the bottle neck, the same way my typing speed has never been the bottle neck.
The bottle neck has been, and continues to be, code review. It was in our pitch deck 4 years ago; it's still there.
For most companies, by default, it's a process that's synchronously blocked on another human. We need to either make it async (stacking) or automate it (better, more intelligent CI), or--ideally---both.
The tools we have are outdated, and if you're a team with more than 50 eng you've already spun up a sub team (devx, dev velocity, or dev productivity) whose job is to address this. Despite that, industry wide, we've still done very little because it's a philosophically poorly understood part of the process (why do we do code review? Like seriously, in three bullet points what's the purpose - most developers realize they haven't thought that deeply here).
-functionality, does it work? And is it meeting reqs?
-bug prevention, reliability, not breaking things
-matching of system architecture and best practices for the codebase
Other ideas:
-style and readability
-learning for the junior and less so the senior probably
-checking the “code review” box off your list
An AI maximalist might say that code review is no longer necessary because in the case that there is an issue in a subsystem nobody is familiar with, you can simply ask the AI to read that source code and come back with a report of where the bug is and a proposal of how to fix it. And, since code review is useless anyway, might as well take the human out of the loop entirely - just have AI immediately commit the change and push it to production and iterate if or when another issue emerges.
This is the dream of autonomous, self-managing systems! Of course this dream is decades old at this point, and despite developing ever more complex systems it turns out that we were never quite able to do away with humans altogether. Thus, code review still appears to be useful. But it's only useful if everybody goes into it with the mindset that the goal is knowledge sharing. If the outcome of a review is not that everyone comes out of it with a good understanding of the purpose and function of the code being committed, then imo it was a waste of time.
1. Collaborate asynchronously on architectural approach: (simplify, avoid wheel reinvention)
2. Ask "why" questions, document answers in commits and/or comments to increase understanding
3. Share knowledge
4. Bonus: find issues/errors
There are other benefits, like building rapport, getting some recognition for especially great code.
To me code reviews are supposed to be a calm process that takes time, not a hurdle to quickly kick out of the way. Many disagree with me however, but I'm not sure what the alternative is.
Edit: people tend to say reviews are for "bug finding" and "verifying requirements". I think that's at best a bonus side effect, that's too much to ask a person merely reading the code. In my case, code reviews don't go beyond reading the code (albeit deeply, carefully). We do however have QA that is more suited for verifying overall functionality.
This really gets at the benefits you mention and keeps people aligned with them instead of feeling like code review should be rushed.
Also hi Peter! Long time :)