I don't think AI will make your processes go faster

Posted by TheEdonian 15 hours ago

I don't think AI will make your processes go faster(frederickvanbrabant.com)

515 points | 366 comments

angarg12 11 hours ago|

> This exact thing is what software developers have been begging for since the beginning of the profession: Receiving a detailed outline of the problem and what the end result should look like.

> This is often the part that slows down software development. Trying to figure out what a vague, title only, feature request actually means.

But that is exactly what Software Engineering is!. It's 2026 and the notion that you can get detailed enough requirements and specifications that you can one-shot a perfect solution needs to die.

In my experience AI has made us able to iterate on features or ideas much faster. Now most of the friction comes from alignment and coordination with other teams. My take is that to accelerate processes we should reduce coordination overhead and empower individuals and teams to make decisions and execute on them.

pron 11 hours ago||

> It's 2026 and the notion that you can get detailed enough requirements and specifications that you can one-shot a perfect solution needs to die.

It's 2026 and the idea that even with detailed-enough requirements you can one-shot even a workable (let alone perfect) solution also needs to die. Anthropic failed to build even something as simple as a workable C compiler, not only with a perfect spec (and reference implementations, both of which the model trained on) but even with thousands of tests painstakingly written over many person-years. Today's models are not yet capable enough to build non-trivial production software without close and careful human supervision, even with perfect specs and perfect tests. Without a perfect spec and a perfect human-written test suite the task is even harder. Maybe in 2027.

ianbutler 11 hours ago|||

Sorry where are we seeing that it failed? It compiled multiple projects successfully albeit less optimized.

" It lacks the 16-bit x86 compiler that is necessary to boot Linux out of real mode. For this, it calls out to GCC (the x86_32 and x86_64 compilers are its own).

It does not have its own assembler and linker; these are the very last bits that Claude started automating and are still somewhat buggy. The demo video was produced with a GCC assembler and linker.

The compiler successfully builds many projects, but not all. It's not yet a drop-in replacement for a real compiler. The generated code is not very efficient. Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.

The Rust code quality is reasonable, but is nowhere near the quality of what an expert Rust programmer might produce. "

For faffing about with a multi agent system that seems like a pretty successful experiment to me.

Source: https://www.anthropic.com/engineering/building-c-compiler

Edit: Like I think people don't realize not even 7 months ago it wasn't writing this at all.

pron 10 hours ago|||

> where are we seeing that it failed?

Anthropic said the experiment failed to produce a workable C compiler:

- I tried (hard!) to fix several of the above limitations but wasn’t fully successful. New features and bugfixes frequently broke existing functionality.

- The compiler successfully builds many projects, but not all. It's not yet a drop-in replacement for a real compiler.

(source: https://www.anthropic.com/engineering/building-c-compiler)

Software that cannot be evolved is dead software. That in some PR communications they misrepresented their own engineer's report is beside the point.

> It compiled multiple projects successfully albeit less optimized.

150,000x slower (https://github.com/harshavmb/compare-claude-compiler) is not "less optimised". It's unworkable.

> Like I think people don't realize not even 7 months ago it wasn't writing this at all.

There's no doubt that producing a C compiler that isn't workable and is effectively bricked as it cannot be evolved but still compiles some programs is great progress, but it's still a long way off of auonomously building production software. Can today's LLM do amazing things and offer tremendous help in software development? Absolutely. Can they write production software without careful and close human supervision? Not yet. That's not disparagement, just an observation of where we are today.

tardedmeme 9 minutes ago|||

This evaluation appears to be AI-written itself. It claims a 3x slowdown and a 4x slowdown combine to produce a 158000x slowdown "because there are billions of iterations" - yeah well both versions of the program had the same number of iterations.

Does anyone know how the 158000x slowdown happened? That's quite ridiculous.

ianbutler 9 hours ago|||

> Can they write production software without careful and close human supervision? Not yet. That's not disparagement, just an observation of where we are today.

I never claimed they could! I just view this as a successful experiment. I don't think anthropic was making that claim with their experiment either.

It feels reflexive to the moment to argue against that claim, but I tend to operate with a bit more nuance than "all good" or "all bad".

areweai 2 hours ago|||

I think people are concerned about the large discrepancy in concrete claims in your previous comment and subsequent empirical information. You may have seen a headline or skimmed an article and missed some details, not a big deal.

The overall impression given was inaccurate and the implicit claim of a fully working end-to-end generated compiler was inaccurate. The headlines were incomplete in a way that was intentionally misleading. It was an interesting experiment and somewhat impressive but the claims were overblown. It happens.

pron 9 hours ago||||

The experiment failed to produce a workable C compiler despite 1. the job not being particularly hard, 2. the available specs and tests are of a completely higher class of quality than almost any software, not to mention the availability of other implementations that the model trained on.

You can call that a success (as it did something impresssive even though it failed to produce a workable C compiler) but my point in bringing this up was to show that today's models are not yet able to produce production software without close supervision, even when uncharacteristically good specs and hand-written tests exist.

ianbutler 8 hours ago|||

That's great and all, but that's not the point I was making and you're engaging rather uncharitably on it. So when you view it from the perspective of capability increase it's rather impressive. Note the slope of progress which this experiment was to show.

Edit: Maybe uncharitably is too strong, but we're talking past each other.

auggierose 6 hours ago||

pron made this statement:

> It's 2026 and the idea that even with detailed-enough requirements you can one-shot even a workable (let alone perfect) solution also needs to die.

and brought up the failed anthropic experiment as proof of that. Yes, you are talking past each other, but that is not pron's fault. It is your fault.

ianbutler 6 hours ago||

Eh fair enough!

KajMagnus 8 hours ago|||

Saying the model failed to write a competitive C compiler makes more sense.

I don't think they tried to do that though.

> today's models are not yet able to produce production software without close supervision, even when uncharacteristically good specs and hand-written tests exist.

That's a good point anyway

pron 7 hours ago||

> Saying the model failed to write a competitive C compiler makes more sense.

Their compiler fails to compile (well, at least link) some C programs altogether, and in other cases it produces code that is 150,000x slower than a real C compiler with optimisations turned off (interestingly, the model trained on the real compiler's source code). That's not "not competitive" but "cannot be used in the real world". But even more importantly, the compiler cannot be fixed or evolved. It's bricked (at least as far as today's models' capabilities go). For any kind of software, not being able to improve or fix anything or add any new feature means it's effectively dead.

You could not use it in production even if no other C compiler existed.

jiggawatts 4 hours ago|||

While I understand both points of view, I'm leaning towards yours, because:

- John Carmack embedded a C compiler and interpreter/runtime into Quake back in the mid 1990s as a scripting language! It was that efficient that it could be used in a real time 3D shooter. That's a solo effort as a minor component of a much larger piece of software.

- I've seen university CS courses hand out "implement a C compiler" as a homework / project exercise for students. It's not particularly difficult.

Sure, a modern C compiler like GCC has to handle inline assembly, various extensions, pragmas, intrinsics, etc... but like you said, all of those are thoroughly documented and have open source implementations to reference.

Similarly, the Rust compiler is implemented in Rust and could be used as an idiomatic reference for a generic compiler framework with input handling, parsing, intermediate representations, and so forth.

lmm 5 hours ago|||

> Their compiler fails to compile (well, at least link) some C programs altogether, and in other cases it produces code that is 150,000x slower than a real C compiler with optimisations turned off

I would bet that those things are also true of at least one expensive commercial C compiler.

vajrabum 1 hour ago||

I'd love to hear of any currently available commerical C compiler which has that level of issues. I would bet you'll be hard pressed to find one. C compilation is a quite thoroughly solved problem. In any case please provide an example.

nvme0n1p1 9 hours ago||||

Why are you quoting from their marketing blog as if it's a reliable source?

https://github.com/anthropics/claudes-c-compiler/issues/1

> Apparently compiling hello world exactly as the README says to is an unfair expectation of the software.

dnautics 11 hours ago||||

Yeah I think people are really underestimating what LLMs can do even without specs.

As an example, I did an exploratory attempt to add custom software over some genuinely awful windows software for a scientific imaging station with a proprietary industrial camera. Five days later Claude and I had figured out how to USB-pcap sample images and it's operationalized and smoothly running for months now. 100% of the code written by Claude, it's all clean (reviewed it myself) pretty much all I did was unstuck it at a few places, "hey based on the file sizes it looks like the images are being sent as a 16-bit format")

For day to day work, I'll often identify a bug, "hey, when I shift click on this graphical component, it's not doing the right thing". I go tell Claude to write a RED (failing) integration test, then make it pass.

Zero lines of code manually written. Only occasionally do I have to intervene and rearchitect. Usually thus involves me writing about ten lines of scaffold code, explaining the architectural concept, and telling it to just go

pron 10 hours ago||

People both underestimate and overestimate what LLMs can do. LLMs have shown very different results when autonomously writing a small program for personal use and autonomously writing production software that needs to be evolved for years.

jyounker 6 hours ago||||

By "non-workable" I think people mean that it won't compile Hello World.

YZF 9 hours ago|||

GCC has only like a billion man hours in it?

Assembler and linker are not part of a compiler. They are separate tools. They are also generally much simpler.

SirHumphrey 11 hours ago||||

Most software is much simpler than a c compiler.

pron 11 hours ago|||

A workable C compiler is a ~10-50KLOC program, and a fairly simple one at that (batch, with no concurrency or interaction). That Anthropic's swarm of agents wrote 100KLOC before failing is a symptom of the problem. It's certainly possible that many programs are in the sub 5KLOC range, but it's definitely not "most software". Plus, almost no software has this level of detailed spec, ready-made tests, and a selection of existing implementations of the same spec.

My first thought when reading Anthropic's description of the experiment was that it is unrealistically easy. It's hard to come up with realistic jobs in the 10-50KLOC range that would be this easy for an LLM. That it failed only shows how much further we still have to go.

quantumleaper 11 hours ago|||

A bit off topic, but see how Anthropic publicity stunts went from "Claude C Compiler" with 100K LOC to the recent Bun Rust rewrite with 1M LOC (10x!) in just 3 months.

I get that it's "novel" creation vs porting, but given that they reported that the C compiler cost them $20k in API costs, the Bun rewrite must be at least $200k, maybe even closer to a million. Pure madness.

gmueckl 11 hours ago|||

Asking an LLM tp change programming language of an implementation is completely different from asking it to code from spec. It's orders of magnitude simpler in practice. I converted some 60kloc of Java to C++ and it works. There were some issues where the Java implementation used runtime reflection because that needs creative workarounds and not all of the C++ translations worked on the first try. And that was my first serious attempt at a task with an LLM. I could likely do better now. An important task simplification here is that a well designed codebase can be converted in small pieces and then joined back together. So the total amount of code converted becomes an irrelevant metric.

pron 11 hours ago|||

Yes, the task is very different, but also it will be months to a year until we know the results of the bun experiment.

quantumleaper 11 hours ago||

I don't know how it could fail - Bun loses popularity among devs? Is it an objective metric? From what I understand, Node.js remains dominant across the industry as a whole, with Deno and Bun mostly used by startups.

Anthropic can always fire the Opus/Mythos token machine gun on any problem (bugs, features, security) to ensure PR success, and there would be plenty of AI-sphere startups already drinking the kool-aid that would consider the whole vibe-coding thing to Bun's benefit.

pron 10 hours ago|||

> Anthropic can always fire the Opus/Mythos token machine gun on any problem (bugs, features, security) to ensure PR success,

Can they, though? They tried and failed to do it in their C compiler experiment. The experimenter wrote: "I tried (hard!) to fix several of the above limitations but wasn’t fully successful. New features and bugfixes frequently broke existing functionality."

eudamoniac 10 hours ago||||

It could fail due to maintenance burden. There is a lot of code now that no one wrote.

t_mahmood 9 hours ago|||

Are we assuming, all tests pass == software done?

Do Firefox not have tests? Then how was there over 200 CVEs found?

Are we going to be comfortable running a piece of software that has 1M lines, and who knows how many zero-days will be in it.

Yes, sure they are going to use LLM to find the CVE's, and so will the hackers. You need a day or two to fix the security issue, a hacker just need to put it in use.

And good luck debugging a million line code base.

1M LOC == already failed.

rowanG077 10 hours ago|||

The compiler that claude made went way beyond workable. It could compile the full linux kernel afaik. That is much further even beyond standard C.

pron 10 hours ago||

People who independently tried to use it reported that it is very much not workable:

- "CCC compiled every single C source file in the Linux 6.9 kernel without a single compiler error (0 errors, 96 warnings). This is genuinely impressive for a compiler built entirely by an AI. However, the build failed at the linker stage with ~40,784 undefined reference errors."(https://github.com/harshavmb/compare-claude-compiler)

- Overall it’s an interesting experiment, and shows the current bleeding edge of Claude’s Opus 4.6 model. However the resulting product is also a clear example of the throwaway nature of projects generated almost entirely by AI code agents with little human oversight. The prototype is really impressive, but there is no real path forward for it to be further developed. It can build the Linux kernel [for RISC-V], which is impressive. It can also build other things… if you are lucky, but you really cannot rely on it to work. (https://voxelmanip.se/2026/02/06/trying-out-claudes-c-compil...)

Anthropic themselves said that the codebase was effectively bricked and that their agents could not salvage it.

rowanG077 7 hours ago||

Well then as you say a 10-50KLOC C compiler is workable. Could you show me the C compiler that does manage to compile a modern Linux kernel that is of that size?

spc476 5 hours ago||

TCC did several years ago. It could boot Linux from source in under 10 seconds. It's wasn't that big of a C compiler. It's in the 50,000 lines of code range.

rowanG077 3 hours ago||

This was 20 years ago from what I can find. Beside that Linux now is a vastly different codebase than it was 20 years ago. That effort also did not compile Linux unmodified, it required several changes: https://bellard.org/tcc/tccboot_readme.html.

binary0010 11 hours ago|||

Not really.

I can make a c compiler in a couple weeks just by looking up open source libraries and copying them.

I can't make any software that people will pay me money to use without taking months/years of development, research, expiramentation and iteration.

Just because the original people who invented compilers had to be genius, doesn't mean anyone has to spend much time or thought in copying that work now.

YZF 9 hours ago|||

I built a compiler for a simpler language as part of my compilers course in a CS degree. It was a non-trivial exercise well beyond the majority of software applications. What open source libraries did you have in mind and what are you copying?

If you can truly write a C compiler in weeks then kudos to you. How many compilers have you written so far for how many languages?

I work for big tech and I would say a large % of developers are incapable of producing a working C compiler on any reasonable time scale, certainly not weeks, even with looking at open source. I'm sure they can download one and run it. Most developers today don't even know C or assembler. They don't know how to approach the C language spec. The top 5-10% of developers/engineers can do it but even for them it's non-trivial.

binary0010 5 hours ago|||

I'd copy and paste from all the thousands of open source ones, what do you mean?

There are plenty of open source compilers that I can copy and paste whatever I need to. I don't get why you think this would have any level of difficulty?

Of course I couldn't make a brand new compiler that was better than what's out there...

Just like a game engine, I could clone one of the thousands of engines out there pretty easily - making something better or novel would be difficult. Just making a bare bones clone of what already exists by referencing documentation and pre-existing code is relatively easy now.

Yeah, when I made a mediocre 3d game engine 20 years ago, it was brain breaking difficult work. I can make one infinitely better in a micro fraction of the time now because most of the hard stuff is done and can just be looked up now.

Do you not agree?

YZF 4 hours ago||

If you copy and paste an entire compiler you didn't make anything. If you copy pieces from different compilers they won't work together. So I'm not sure how you "make" a compiler with copying and pasting from open source compiler. Are you saying you'll take one file from clang, one from gcc, another another from another compiler?

Sure. You can clone gcc and build it. You can close a game engine and use it.

pron 9 hours ago|||

> It was a non-trivial exercise well beyond the majority of software applications

That depends on how you count. By number of programs that may well be right, but that's not what matters in terms of impact on the industry, as software value roughly corresponds to the number of people working on a particular piece of software (or lines of code, if you wish). By number of people/LOC most software is not in the "simpler than a C compiler" category.

virgilp 8 hours ago|||

I wonder how knowledgeable in compilation was the engineer that attempted this. I'm pretty confident that I could produce a decent C compiler in a few weeks (or less), if given Opus 4.7 + unlimited tokens + a good test suite. (and this is not blind unsubstantiated belief in AI, I've recently rewritten a somewhat sophisticated interpreter in a week with AI; and have worked on several C++ compilers in the past, including a GCC port to a custom DSP, so I have a bit of an idea about what this would take).

But yeah, this is not a "one shot" project, none of it is. One shot doesn't work even with humans - after all, this is exactly what killed waterfall as a methodology.

pron 8 hours ago|||

> I'm pretty confident that I could produce a decent C compiler in a few weeks (or less), if given Opus 4.7 + unlimited tokens + a good test suite.

Of course. The point is that a full, detailed spec isn't enough (even in the rare situations it does exist, like for a C compiler). At least for the moment, you need expert humans to supervise and direct the agents.

Vibe coders usually also let the agents write the tests, which mean that the only independent human validation of the software is some cursory manual inspection. That also obviously isn't enough to validate software.

> One shot doesn't work even with humans - after all, this is exactly what killed waterfall as a methodology.

You can one-shot a C compiler with humans. LLMs' software development ability is impressive and helpful, but it is not human-level yet, even if at some tasks the agents are better than most human programmers. And while many waterfall projects failed, many succeeded (although perhaps not as efficiently as they could have). So far I don't believe agents have been able to produce any non-trivial production software autonomously.

zem 8 hours ago|||

yeah, the key part is that there be a human in the loop, directing and course-correcting the ai while it produces code in reasonably small and well defined stages.

jimbokun 18 minutes ago|||

A previous iteration of my company had a CEO with a simple management idea that I believe worked really well: treat each product team as a mini startup.

That means EVERY role needed to develop the product was in that team. No separate corporate wide QA function, infrastructure and operations function, sales function, project management function, or domain expertise function. All the people performing those functions for that project were part of the project team.

Now this is somewhat hyperbole as if there is no sharing of resources whatsoever you don’t really have a single corporation.

But the idea is clarifying and helps to eliminate silos and tighten communication and feedback loops.

I miss that style of working. Although I try to break those barriers where I can as an individual contributor by just figuring out who needs to talk to who to make things happen and opening those channels of communication.

juanre 10 hours ago|||

I completely agree. It's more than 40 years since I wrote my first program, and I've never seen software that was first specified and then written and all was good.

The most difficult part of any non-trivial engineering is understanding the problem, and the first versions of a piece of software are how you reach that understanding.

That's why I do not think that AI-powered "software factories" will ever work. It's waterfall development all over again. An architect writing UML diagrams and handing them off to the team of programmers to do the essentially mundane task of implementing... the wrong thing.

AI is, however, very good at helping you go fast from the wrong first version to the less wrong second one. But you need to remember that your main task is to understand the problem that you are trying to solve.

daxfohl 6 hours ago||

Yeah and any detailed design is still likely to skip over "obvious" things like "only admin users can use admin features". Both the PM and the engineering team will understand this implicitly. But with AI, you never can tell if it's going to make that inference, or just create admin users and admin APIs with no relation between them. These are also the bugs that can most easily slip through, because the reviewer wouldn't even think to look for it.

Philip-J-Fry 9 hours ago|||

I don't agree.

I regularly get pieces of work someone product guy has thought up in an afternoon. They only care about the happy path, and sometimes only part of the happy path. I work for a global company that has to abide by rules and regulations in each country we operate in. The product guy thinks up some feature, we implement the feature, then we're told "actually, we legally aren't allowed to do this in 90% of the markets we operate in". Cool, so we add an ability to disable it in those markets. Then they come back "We can do this in some of those markets if it's implemented with [regulatory bureaucracy], so can you do that please".

Then we have to hack away at the solution because the deadline is right around the corner.

This is not software engineering! None of this is related to the software. The job of a software engineer is to take a list of requirements and figure out the way we accomplish those requirements. Requirements gathering is NOT a software engineering problem. Software is implementation, product is behaviour. That's the split. The behaviour of the thing we're building needs to be known before we even try to seriously build it.

If someone just held back for week and did their due diligence, we would been able to architect a solution that is scaleable, extensible, easy to maintain and can make the future easier.

nuancebydefault 8 hours ago|||

> Requirements gathering is NOT a software engineering problem. Software is implementation, product is behaviour. That's the split.

That's a theory but I've never seen this work in practice. A piece of software is unique. If it weren't, we'd just use the cp command.

What usually happens is you get a set of requirements that looks simple. Then you start thinking about a design and see 10 different possibilities, each corresponding to a slightly different interpretation of the requirements set. You iterate a few times reviewing the designs with who set the requirements and a few peers and see more possible variations to the requirements. You need to double check its parent requirements up to the master requirements. Then you need to take time/feature/quality tradeoffs, affecting the fulfillment of requirements.

Once starting to implement, you see dependencies to other software (framework, sdk, drivers, language features,...) and understand that other software is not what you thought, or has bugs. Or you see an issue with performance or see that one particular feature becomes unfeasible.

That's where all the complexity goes. AI doesn't change that, but can make prototyping iterations and bug hunting faster, as long as someone holds it on a leash and understands its decisions.

marcus_holmes 2 hours ago||||

I think this was TFA's point about "engineers have been begging to be involved earlier in the process forever". Which is absolutely true.

It has to be someone's job to push back on the Product Guy's stupid idea and answer all the awkward questions about the not-so-happy path with it. Unfortunately, because of the way we've ended up with this process, that person is often the engineer tasked with building it, without any effective political power to challenge the design process.

sarchertech 4 hours ago||||

You realize that we were making software for decades before Product Managers existed right?

My senior year software engineering class had a whole section on requirements gathering.

ajam1507 7 hours ago|||

This seems more like a failure of management and process than a problem inherent to autonomy.

jmalicki 1 hour ago|||

This is also the part that AI speeds up the most for me, maybe 100x productivity.

I start with something like this prompt:

"This is a research project around <vague statement>. What do competitors, like <x>, <y>, <z> do around this, are there any blog posts or tech talks?

Are there any academic approaches or recent papers around the topic?

Can you survey any related open source projects? I know of <x> and <y>. Please include analysis of activity, github stars, number of downloads on npm/pypi/crates, and search the web for reviews or complaints or positive or negative blog posts from developers.

All claims should have links to the original sources, preferably with quoted text where appropriate.

We are going to write a research plan for how to produce this report.

The implementation of the plan will spawn subagents to survey breadth, then spawn subagents for each depth topic in detail"

harrall 11 hours ago|||

Trying to figure out the best way to solve vague requirements is why I got into engineering.

If I got detailed specs, I’d just be a coding robot. I push that work off onto juniors.

hnthrow0287345 9 hours ago||

If they can't at least imagine the golden path themselves and write it down, they shouldn't be in charge of the product because they will be unlikely to understand any other in depth conversations about it. And I have no idea how they'd be having coherent conversations with anyone above them either. They're also unlikely to use AI well or not identify bad-out-of-the-gate solutions. It is of course different if they're just gathering opinions or want a PoC or exploratory work done, but those aren't requirements to me.

Developers are unlikely only doing development these days. There's ops and support to do as well, so more back and forth is less time doing those things and development.

We need to meet in the middle about requirements otherwise developers will end up doing someone else's job for them.

Cthulhu_ 11 hours ago|||

I'm seeing decision-makers / people who write requirements starting to use AI as well in my day to day. As before, my job is to read, understand and test those requirements against the real world as I understand it. But same with code. Software engineering for the past (at least) 20 years has had a core focus of "don't trust anyone", this hasn't changed and this takes a lot of time and effort still.

Terr_ 11 hours ago||

The problem is that instead of trying to figure out what they really want/need, now we're trying to figure out what they really wanted or needed before it got obfuscated by the babble-machine.

getnormality 2 hours ago|||

> Now most of the friction comes from alignment and coordination with other teams.

Then I see a solution! Why don't we simply put the entire company on one big team?

jimbokun 32 seconds ago||

Putting everyone responsible for some function of a product on one team, instead of having separate departments for separate functions, can do wonders for actually shipping and iterating on software.

thisisnotmyname 3 hours ago|||

Not to mention that ai lets the domain experts create and test proof of concept implementations themselves. This alone has been a revelation for us and saves a tremendous number of design cycles.

thisisit 7 hours ago|||

> we should reduce coordination overhead and empower individuals and teams to make decisions and execute on them.

Improved collaboration. Says every new CEO and manager. The notion that this is ever going to be solved especially with different experience, views, agendas etc needs to die too. AI is surely not going to help and with that roadblock iterating faster doesn’t help because then people want to try just for trying.

BloondAndDoom 5 hours ago|||

We will see smaller and smaller teams where all this overhead is minimized by handful of people, and more than ever we will see 2-5 people teams creating great software

stingraycharles 11 hours ago|||

Yeah I agree, such a fundamental aspect of software engineering is translating ambiguous “asks” into specific requirements. We now have a tool to convert those requirements directly into code.

And yes, architecture and how to actually implement the designs are also part of the requirements.

The code is just the implementation, the actual problem that needs solving is one abstraction level higher.

mmcnl 10 hours ago|||

This is true, but funny thing is: it was also true before AI.

ModernMech 11 hours ago|||

It's UML and outsourcing all over again: If only we can write the perfect UML diagrams representing the ideal class hierarchy, we can just put that in an email, send it to India, then we'll get back exactly the program we wanted, no mistakes!

gedy 10 hours ago||

> Trying to figure out what a vague, title only, feature request actually means.

> My take is that to accelerate processes we should reduce coordination overhead and empower individuals and teams to make decisions and execute on them.

This is funny because it's exactly what the agile/scrum training taught me 20 years ago.

phyzix5761 15 hours ago||

I think when LLMs first came out people thought they could just say something like, "Make a Facebook clone". But now we're realizing we need to be more exact with our requirements and define things better. That has always been the bottle neck in software.

When I was working we used to get requirements that literally said things like, "Get data and give it to the user". No definition of what data is, where its stored, or in what format to return it. We would then spend a significant amount of time with the product person trying to figure out what they really wanted.

In order to get good results with LLMs we need to do something similar. Vague requirements get vague results.

satvikpendem 13 hours ago||

In what I've seen, tickets are much richer in detail now because PMs are using AI (connected to the codebase itself, like Claude Code or Codex) to fill out a template as to what and why the problem is (ie X field exists in the backend not frontend), how and where to get any data (query the backend), and what acceptance criteria is needed (frontend should have the field exposed and "submit" should push the field's data to the backend where it should show up in the databas), which is something they would not have done before, due I guess to laziness and thinking the devs can figure it out. Then devs can copy paste this Jira ticket content into the LLM agent of choice (or even use the Atlassian MCP to have the LLM read it automatically).

This has significantly helped devs and made sure that requirements are very clear.

Honestly, with the first step, it seems the PMs are already halfway there to implementation of the feature so I wonder if in the future they'll just do everything themselves and a few devs will be around as SDETs rather than full blown implementers.

lr4444lr 12 hours ago|||

I can't imagine SWEs will be reduced to SDETs anymore than attorneys will be reduced to spell-checkers on AI powered case briefs.

I am a very AI-forward person, but hallucinations are becoming more pernicious than ever even as they get less frequent, especially if the code actually works. A human absolutely has to guide these processes at a macro level for sustainability for SaaS as it evolves with business needs.

Maybe for one and done systems with no maintenance/no updates/no security patches you can reduce humans to SDETs, but systems like that are more the exception than the norm.

tombert 10 hours ago|||

I've noticed even more than the "hallucinations", just the code is generally quite bad.

At least with concurrent and distributed systems stuff (which is really all I know nowadays), it is great at getting a prototype, but the code is generally mediocre-at-best and pretty sub-optimal. I don't know if it's because it is trained on a lot of mediocre and/or buggy code but for concurrency-heavy stuff I've been having to rewrite a lot of it myself.

I think that AI is great for getting a rough POC, and admittedly often a rough POC is good enough for a project (and a lot of projects never get beyond a rough POC), but I think software engineers will be needed for stuff that needs to be more polished.

bunderbunder 1 hour ago|||

I'm getting the impression that LLMs are just not very good at "reasoning" about time. I have definitely had success getting a coding agent to produce decent concurrent code, but I had to basically lead it by the nose, and I strongly suspect that in most cases it would have taken less time to just do it the old fashioned way.

tombert 30 minutes ago||

I've had good luck having it translate TLA+ specs to programming languages. The specs are written by me and my fingers, and I've done most of the interesting concurrency reasoning beforehand.

I'm pretty sure it still saves me time, and if nothing else it's an excuse to write TLA+, and that's fun.

DrewADesign 6 hours ago||||

Numerous real world technical requirements can be solved with existing code, lightly modified. That’s basically LLM code’s bread and butter. The further you get from that, the closer the “time saved using LLM” line gets to zero, and once it crosses, it becomes the “time wasted using LLM” line. I think embedded and concurrent systems are going to require more unique code solutions than, say, a crud web app with a few interesting feature-building junkets.

vips7L 6 hours ago|||

The code is quite terrible, but no one has ever cared about code quality, at least in my experience. All they’ve ever cared about is that “it works”. It’s why an army of juniors always write most of the code.

I had this same discussion at work the other day. I had an 80k line generated project dropped on my plate. It doesn’t use anything built into the web framework or orm. It’s a maintenance nightmare.

tombert 6 hours ago|||

I think there are plenty of projects where "good enough" really is "good enough"...maybe most apps? If you're just making a shitty simple app, I don't really care about code quality.

Example: I got Claude to generate a language server for TLA+ so I could have nice integration with Neovim. It took like 45 minutes of arguing with Claude and then it worked fine. This is incredibly low-stakes stuff: realistically the worst case scenario is that the text in the file gets screwed up, and I'm somewhat protected by Git if that happens.

That said, I am a little concerned how cavalier people have been deploying AI code everywhere. I don't want pacemaker firmware to be written by some intern in an afternoon with Claude.

vips7L 5 hours ago||

Yes I agree, the low stake, low evolution code is perfect for LLMs. The project I was handed is not that at all.

andai 2 hours ago|||

Maybe you can ask Claude to reverse engineer what the original prompt was.

satvikpendem 11 hours ago||||

By SDET I mean one who reviews not writes code, maybe we have different definitions of that term because you also mention humans being needed to guide the processes.

Even still, other professions interact with the real social world which is not necessarily the case with programming. A lawyer will always be needed because judgments are and must be made by humans only. Software on the other hand can be built and tested in its own loop, especially now with human readable specifications. For example, I wanted to build an app and told Claude and it planned out the features, which I reviewed and accepted, then it built, wrote tests, used MCPs including the browser for interacting with the UI and taking screenshots of it, finding any bugs and regressions, and so on until an hour later it came back with the full app. Such a loop is not possible in other professions.

lr4444lr 6 hours ago|||

No one's arguing you can't stand up a good MVP.

It's when you have to iterate to handle changing business needs, scale issues, and integrate with other systems where the entropy becomes a scary concern over a long enough timeline.

And it's not just "checking" - it's wholesale rejections of code, reframing prompts to target specific classes or approaches, etc... I don't think you will take the human out planning any time soon.

satvikpendem 5 hours ago||

I agree, humans will always be there.

Xirdus 5 hours ago|||

> A lawyer will always be needed because judgments are and must be made by humans only.

Honestly, I believe lower court judges will be the first job in the legal industry to become fully automated.

lemonberry 9 hours ago|||

This afternoon I was speaking with a friend and mentioned that I need to find a lawyer for contracts. His immediate response was, "you don't need a lawyer, just use AI". Not an avenue I'm interested in going down.

majormajor 12 hours ago||||

IMO the code-generation for boilerplate and the improvement of copypasta quality are much bigger improvements than that.

PMs turning their brain off and letting the LLMs extrapolate from quick and dirty bashing of text into a template (or, PMs throwing customer feedback at a slackbot to generate a jira ticket form it) can be better than PMs doing nothing but passing ill-defined reqs directly into the ticket, but that's a low bar. And it doesn't by itself solve the problems of the details that got generated for this ticket subtly conflicting with the details that got generated for (and implemented) in a different ticket 8 months ago.

bodge5000 3 hours ago||||

> it seems the PMs are already halfway there to implementation

Halfway there feels way overblown, and only seems to further devalue to work that devs do. Having clearly written requirements would be fantastic, and even as someone less pro AI even I can see great utility for it here, but its not halfway there to implementation. Not even 25% in all honesty, since edge cases and unforeseen consequences can cause changes to the spec midway through development.

hedgehog 13 hours ago||||

If you do that someone still needs to make sure the details make sense which, from experience, sometimes they will and sometimes they won't. When I open tickets using automation I often back into the ticket from a running implementation that passes tests so the description is at least internally consistent but there are often still issues that need corrected.

satvikpendem 11 hours ago||

That's what a good PM and developer pair should be doing, it's just that it's a lot faster for both of them now to review and work in tandem to get the feature done, because the bottleneck is the code generation.

AdieuToLogic 3 hours ago||

> That's what a good PM and developer pair should be doing, it's just that it's a lot faster for both of them now to review and work in tandem to get the feature done, because the bottleneck is the code generation.

The bottleneck is understanding, never "code generation."

Below is an an axiom which has served me well over the years. Perhaps it will for you as well.

  When making software, remember that it is a snapshot of 
  your understanding of the problem. It states to all, 
  including your future-self, your approach, clarity, and 
  appropriateness of the solution for the problem at hand. 
  Choose your statements wisely.

derefr 11 hours ago||||

> Honestly, with the first step, it seems the PMs are already halfway there to implementation of the feature so I wonder if in the future they'll just do everything themselves

I'm guessing they've tried (or been induced to try by upper management), but given up because they don't know how to debug any problems that arise due to the LLM working itself into a corner.

Coding-agent LLMs act a lot like junior devs. And junior devs are: eager to write code before gathering requirements; often reaching for dumb brute-force solutions that require more work from them and are more error-prone, rather than embracing laziness/automation; getting confused and then "spinning their wheels" trying things that clearly won't work instead of asking for help; not recognizing when they've created an X-Y problem, and have then solved for their Y but not actually solved for the original problem X; etc.

The way you compensate for those inexperience-driven flaws in junior devs' approach, is to have them paired with, or fast-iteration-code-reviewed by, senior devs.

Insofar as a PM has development experience, it's usually only to the level of being a "junior dev" themselves. But to compensate for LLMs-as-junior-devs, they really need senior-dev levels of experience.

The good PMs know all of this, and so they're generally wary to take responsibility for driving the actual coding-agent development process on all but the most trivial change requests. A large part of a PM's job is understanding task assignment / delegation based on comparative advantage; and from their perspective, it's obvious that wielding LLMs in solution-space (as opposed to problem-space, as they do) is something still best left to the engineers trained to navigate solution-space.

datsci_est_2015 1 hour ago||||

> Honestly, with the first step, it seems the PMs are already halfway there to implementation of the feature so I wonder if in the future they'll just do everything themselves and a few devs will be around as SDETs rather than full blown implementers.

Judging by every PM I’ve worked with, 0% chance of this happening. Much sooner would see SWEs making PMs redundant than the other way around. Unless of course you want a system that falls apart like a house of cards as soon as you get a single user for your vaporware.

beej71 10 hours ago||||

> Then devs can copy paste this Jira ticket content into the LLM agent of choice

Super glad to have gotten out when I did...

wijej 13 hours ago||||

lol

Just lol. Is this what you guys mean by productivity boost?

Comical. LLM’s aren’t all that great - it’s more that most orgs are horribly inefficient. Like it’s amazing how bad they are.

That’s why Elon succeeded with spacex - he saw how horrible inefficient the industry was. And used that thinking to take a gamble and it’s paid off.

jnovek 12 hours ago|||

> most orgs are horribly inefficient

Considering that that’s been a running complaint for like 50 years, it doesn’t seem like project management is going to get better on its own at this point. So, yes, an LLM does represent a productivity boost in that area.

batshit_beaver 11 hours ago||

The problem is that organizations are inefficient in such a way that extra output from white collar workers doesn't translate to improved org-wide performance in a positively correlated, linear fashion.

When the org is misaligned, mismanaged, has poor customer feedback loops, bad product market fit, too much bureaucracy, etc etc no amount of AI slop is going to make a meaningful impact on its bottom line. In fact, it will likely do the opposite through combination of exponentially increasing complexity, combined with worker force deskilling, layoffs, and rising token prices. Real bottleneck is and always has been communication & alignment.

It might make the employees _happier_ in the interim though, which, I believe, is what we're predominantly seeing during this AI mania. People fed up with the bullshit jobs of rewriting the same service for the 5th time in 2 years or creating TPS reports weekly just for their manager to throw them directly in the trash are absolutely giddy that they no longer have to do this manually. I think we need to question the economic value of these jobs in the first place, though.

I've worked at big tech prior to LLMs becoming a thing, and consistently saw projects of 20-50 people carried by 2-3 individuals that actually understood what needed to be done. I don't think this ratio will be any better with genAI, and I also don't think that tokenmaxxing has any meaningful correlation with impact. Bullshit jobs (and questionable personal projects) just get done faster now. Yay, I guess.

ejejje 10 hours ago||

Correct most people should be fired.

In the long run these highly inefficient firms are going to get destroyed by people who have a vision and can do what 100+ firms are doing and package it together as one solution that is far superior on dimensions that matter to firms.

batshit_beaver 7 hours ago||

If only it was that simple. The reason these inefficient companies continue to exist is due to regulatory capture and monopolistic behavior. Competing with them doesn't just require better efficiency.

stefan_ 10 hours ago||||

The idea that PM tickets are now much improved because they paste their unbaked wrong "idea of what the ticket is" into ChatGPT to expand into a 500 word behemoth is hilarious.

At least when the PM still wrote it you could outright tell it was bullshit and made no sense. Now that is just obfuscated.

xeromal 12 hours ago||||

You're probably right but that sounds like it's still a win to me.

satvikpendem 11 hours ago|||

Not sure what your point is, LLMs don't have to be all that great to still show a productivity boost and especially if the organization is inefficient, then even more so.

ejejje 10 hours ago||

[flagged]

satvikpendem 8 hours ago||

Two hour old account just to make comments like this, it will get flagged. Next time use your main account.

OptionOfT 13 hours ago||||

Except... no one validates the generated tickets, and it's full of inaccuracies.

And then someone copy pastes it into Claude and now those inaccuracies become part of the code and tests.

satvikpendem 13 hours ago|||

The PMs validate it, why do you think they don't read over it to make sure it fits what they want? You might say "well they're lazy, look why they didn't write enough detail to start off with" but for lots of people, reviewing something to make sure it's close to what they want and then tweaking it is much easier than writing it from scratch.

It's the equivalent of writer's block and is why a common advice given to writers is to put anything they can onto the page then edit it later.

majormajor 12 hours ago|||

> The PMs validate it, why do you think they don't read over it to make sure it fits what they want?

The PM has historically often not had a detailed enough mental model of the implementation to spot the hard parts in advance or a detailed enough mental model of the customer desires to know if it's gonna be the right thing or not.

Those are the things that killed waterfall.

You can use LLM tools to help you improve both those areas. Synthesizing large amounts of text and looking for inconsistencies.

But the 80th-percentile-or-lower person who was already not working hard to try to get ahead of those things still isn't going to work any harder than the next person and so won't gain much of a real edge.

Avicebron 12 hours ago|||

I'm glad you mentioned it and TFA briefly mentioned waterfall. The second graph shown in the article with documentation overlapping the dev cycle, it's like the worse of both agile and waterfall. It's supposedly real-time waterfall.

Normally waterfall works where the scope is extremely-well defined and articulated in design plans. Which shortens dev time because prior to AI code was mostly deterministic. Here we have to do waterfall level of documentation while iterating on a non-deterministic solution (code gen) to non-deterministic requirements (per usual).

It's bonkers.

I still think the technology is cool though.

And to answer the questioner.. Have you worked with a PM? Most of the ones I've worked with try to be simultaneously in charge yet not responsible for anything. Validating something implies skill and responsibility.

satvikpendem 11 hours ago|||

Then they're just bad PMs and don't deserve to have the job. That can be said in any profession, devs or lawyers or doctors who blindly accept LLM output without review are bad employees.

BugsJustFindMe 8 hours ago||

> Then they're just bad PMs and don't deserve to have the job.

Nobody "deserves" anything. They do have the jobs though. Thinking that the world isn't full of people doing what they need to do to get by who don't give a shit about fitting a fantasy ideal is wild.

satvikpendem 8 hours ago||

Deserving and having are two different things, that doesn't mean they can't be criticized either way. By the same logic bad devs and bad dev practices can also be criticized.

BugsJustFindMe 23 minutes ago|||

"They're bad PMs" does not meaningfully respond to people saying the world is full of bad PMs. They know. It was already given. Giving it again in response isn't engaging thoughtfully.

zxornand 12 hours ago||||

I think validating a fully generated novel of a ticket, is much harder than thinking through the problem in the first place and creating your own ticket.

We see it with code too right? It’s harder to review code than to write it.

On top of that the LLM can work so fast that the amount of things that need validating grows!

This is where humans get lazy and the problems come in IMO. Whether its a PM not validating their ticket, or a dev doing a bad code review.

Add on to that that the incentives currently are to move fast and trust the AI.

It becomes clear to me that a lot of that review work either won’t be done at all, or won’t be nearly thorough enough.

satvikpendem 11 hours ago|||

The tickets are not "novel"-length, they are about a few bulleted lists of the sections I mentioned above. In that case it is indeed way easier to review that a ticket only saying "do X with Y data."

Reviewing code is harder than reviewing text because code does something and has interdependencies and therefore must be correct in its function, do not mix the two. This is like saying an editor reviewing an article or novel is harder than actually writing the novel which is blatantly incorrect.

zxornand 10 hours ago||

Most real tickets are more complicated than “Do x with Y data” and also have many interdependencies throughout the business

satvikpendem 8 hours ago||

Most? That's doubtful especially when a lot of tickets are simply CRUD which are fine being generated by an LLM. Those that are more complex require more review and interdependency management, sure, but to say that that is most tickets is simply not correct.

paulhebert 11 hours ago|||

I agree. I hate getting tickets like this because they’ve often gone down the wrong path and I have to work backwards to understand the actual problem and the right way to solve it

mrbombastic 11 hours ago||||

just this week i pushed back on some requirements in a very detailed product spec I was implementing to speed up time to ship. The pm had no idea what I was talking about because the requirements were invented by an LLM. This is not a bad PM, discipline doesn't scale.

BugsJustFindMe 13 hours ago|||

> The PMs validate it, why do you think they don't read over it to make sure it fits what they want?

Hahahahahaha. Sorry, I couldn't help myself; this reads like satire. The answer is "real life experience says otherwise".

ethin 11 hours ago||

Yeah I was so tempted to ask if this person has ever actually met a project/product manager...

satvikpendem 11 hours ago||

Maybe you both just have bad PMs, because just like good devs they should also be reviewing their work. My point was that it is more likely for PMs to review and edit a generated ticket than to have to write it all themselves which they often won't do.

BugsJustFindMe 9 hours ago||

> My point was that it is more likely for PMs to

I feel compelled to point out to you that this is a completely unsustainable, unsupportable, unsubstantiable claim. You have met ~0% of PMs, and of the ones you've met maybe you've experienced a non-zero percentage of their work, but statistically that's also very unlikely.

If you think you can say what most PMs do or what PMs are likely to do, then, I'm sorry, but you are not even thinking like an engineer. You're thinking, actually, a lot more like a PM to many of us.

> just like good devs

I'm so sorry, my sides just can't handle the starry-eyed nature of these takes. This is just too much for me.

To many of us this reads like you've never met people before. But who knows, maybe you live in Lake Wobegon, where all the women are strong, all the men are good-looking, and all the children are above average! If so then we're jealous, but you still should be more careful about how unrigorous your mental model is because it will make you a worse engineer.

Experience with different PMs and developers aside, the older you get in the profession the more you will hopefully realize that none of your quality effort fantasy matters. Sales happen and money rolls in independently of whether you think the PMs or the people who call themselves engineers do a "good job". Businesses thrive on sales and marketing, not engineering.

satvikpendem 5 hours ago||

What a strange response. By your logic you've met ~0% of developers too yet I assume you can distinguish good development practices from bad. I also mentioned good PMs which by definition review and write good tickets with a clear explanation of the problem and what they want the solution to be. If personally meeting millions of people is the epistemic standard you have to know something then I'm not sure how you know anything at all.

As to your latter point, not sure why you think I think business doesn't continue on even with bad employees, of course it does and I didn't say otherwise. But that does not mean they're doing a good job, those two are orthogonal concepts.

And I'm not sure how we even got to this, the original point was that I personally as a dev can physically see PM productivity increasing with AI, even as other devs in this thread seem not to. For a competent PM, a tool that automates a detailed first draft fundamentally changes the psychology of ticket creation. If your argument is just "bad PMs will still be bad," then sure, I agree, but that doesn't really engage with how the tooling changes the workflow for everyone else.

BugsJustFindMe 33 minutes ago||

> yet I assume you can distinguish good development practices from bad

We're not talking about knowing what good is, which is completely irrelevant to anything in this thread. You made a claim without qualification about what it is more likely for PMs to do. I can't tell if you've lost your own chain of thought or are engaging in some kind of motte and bailey fallacy. Either way it's kind of a bad sign.

> If your argument is just "bad PMs will still be bad,"

What they actually said was that the PMs they've encountered were bad and this only makes things worse. So talking about what a good one might do is irrelevant. But you didn't only do that. You expressed denial about the possibility that PMs might be bad at their jobs, which was hilarious, and I think I can explain why by rephrasing the dialogue:

Someone: "I see X failing to do Y."

You: "No you don't. X definitely do Y. Why would you think that X aren't doing Y? Doing Y is the obvious thing for X to do."

Someone: "I literally am seeing it happen right now."

You: "Well then they're bad."

Someone: "Yeah, no shit."

You: "But most X are not bad."

Someone: "Bartender, I'll have what they're having."

resters 13 hours ago|||

This failure is human laziness, not an issue with the technology. People who use AI because they are trying to avoid doing work fall into a completely different category than people using AI as a force multiplier and for skills/capabilities enhancements / quality improvement.

OptionOfT 12 hours ago|||

It's also the only way to get those massive increases in productivity.

danaris 12 hours ago||||

This is very much a "you're holding it wrong" response.

If your technology relies on humans using it in ways that go against the ways they are inclined to use them, then that is an issue with the technology.

majormajor 12 hours ago|||

I don't think that works as a critique of LLMs because it's far too broadly applicable to well-accepted tools.

Are advanced calculators bad because a student could use the CAS to ace calculus homework, exams or the SAT without actually learning the material?

Is copy/paste bad because a person could use it to copy/paste code from one place to another without noticing some of the areas they need to update in the new location, adding bugs and missing a chance to learn some more subtleties of the system?

Is Git bad because a manager could use it to just measure performance by number of lines of code committed instead of doing more work to actually understand everyone's performance?

Many tools can be used lazily in ways that will directly work against a long term goal of improving knowledge and productivity.

convolvatron 12 hours ago||

but in this case that's exactly what AI is doing, and no more. its filling in the gaps with some plausible sounding goo so that the person doesn't have to worry about the details.

ok, so for some of the jobs we're doing plausible sounding goo is just fine. and that's kinda sad. but the 'just playing around' case is fine for PSG, this isn't a serious effort but just seeing how things might work out without much effort.

taking the remainder, where understanding and intent are important, the role of the ai is produce PSG, but the intentional person now goes through everything and plucks out all the nonsense. this may take more or less time than simply writing it, but we should understand this is resulting in less real engagement by the ultimate author. where this is actually interesting is a parallel to Burrough's cutup method - where source text and audio were randomly scrambled and sometimes really clever and novel stuff pops out.

but to say the current model of vibe coding has much to offer in the second case is really quite unclear. to the extent to which coding is the production of boilerplate is really a problem with APIs and abstraction design. if we can get LLMs to mitigate some of that I the short term without causing too much distraction, that's fine, but we should really be using that to inform the solution to the fundamental problem.

so for me what's missing in your model is how LLMs are supposed to be used 'properly'. I don't think laziness is really the right cut here, make-work is make-work, and there's plenty of real work to be done. but in what sense does LLM usage for code actually improve our understanding of these systems and get us more agency?

majormajor 11 hours ago||

I don't disagree with your take on most jobs or vibe coding as shown in countless proof-of-concept/0-to-1 demos. But the comment I was replying to was dismissing this statement from another commenter:

> People who use AI because they are trying to avoid doing work fall into a completely different category than people using AI as a force multiplier and for skills/capabilities enhancements / quality improvement.

This statement is absolutely true. There are ways to use LLM tools to significantly improve the quality of your work instead of to avoid doing hard work. (And the result can easily become something that requires more hard thought, not less.)

Some that I frequently enjoy that are usable even if you don't want the machine to generate your actual code at all: * consistency-check passes asking it to look for issues or edge cases * evaluation of test coverage to suggest any missed tests or proposed new ones * evaluation of feasibility of different refactoring approaches (chasing down dependencies and call trees much more faster than I would be able to do by hand, etc)

> to the extent to which coding is the production of boilerplate is really a problem with APIs and abstraction design. if we can get LLMs to mitigate some of that I the short term without causing too much distraction, that's fine, but we should really be using that to inform the solution to the fundamental problem.

I generally would disagree with this, though. I don't think there's solely a problem with abstraction design, I think the inherent complexity of many systems in the business world is very high (though obviously different implementations make it different levels of painful). If that's a problem, it's a people/social one, not a technology problem.

In my future we lean into the fact that people want features, they want complexity, for many things - everybody's ideal just-for-them workflow/tooling would look slightly different than the next person's - and use these tools to build things that do more, not less. Like the evolution of spellcheck from something you manually ran, to something that constantly ran, to something that can autocorrect generaly-usefully when typing on a touchscreen.

Let's get back to finding more features/customization to delight users with.

jnovek 12 hours ago||||

> This is very much a "you're holding it wrong" response

This isn’t actually an argument for or against anything, I don’t know why people say this. It is entirely possible that people are using this brand new, historically unprecedented tool wrong.

Cars have been a huge success in spite of requiring people to learn a bunch of new things use them.

danaris 11 hours ago||

It's not about having to learn things; it's about the required methods of using the tool going directly against the grain of the way people in general operate.

The classic "you're holding it wrong" was about the iPhone 4: sure, people could learn to hold the iPhone in such a way that they didn't block the particular parts of the antenna that were (supposedly) the problem. But "holding an iPhone" is a fairly natural thing to do, and if the way that people are going to do it naturally doesn't allow its antenna to connect properly, then that's a technology problem, not a human problem.

If the selling point for AI is "you can just talk to it, and it will do stuff for you!" (which may or may not be yours, personally, but it is for a lot of people), then you have to be able to acknowledge that "describing a problem or desire using natural language" is something that humans already do naturally. Thus, if they have to learn to describe their problem in very specific ways in order to get the AI to do what they want, and most people are not doing that, then that's a failure of the technology.

For the specific case at hand, what's being described is similar to the problem of self-driving cars: you're selling the benefit as being the AI taking a lot of the work off your shoulders; all you have to do is constantly check its work just in case it makes a mistake. Which is something that we already know, empirically and with lots and lots of data, that humans are bad at.

Once again, it's a technology issue. Not a human issue.

satvikpendem 8 hours ago||

> selling the benefit as being the AI taking a lot of the work off your shoulders; all you have to do is constantly check its work just in case it makes a mistake.

Cars can take you from place to place much faster than a horse can, all you have to do is learn to drive and constantly keep your hand on the wheel.

Part of using a technology is, well, learning how to use it. It's not the technology's fault that humans are lazy or not able to pay attention and crash.

satvikpendem 11 hours ago|||

Maybe they are holding it wrong then. Like someone else said, people had to be taught how to drive a car and that cannot be in any sense said to be the car's fault.

Some people are lazy, plain and simple. If they want to blindly accept what the LLM tells them without critical analysis and review then that's on them.

iv4122 11 hours ago|||

I second this

decidu0us9034 12 hours ago||||

Maybe for some subset of sotware (like CRM panels or something) PMs will do everything. But if you're projecting the way one sort of software (ie user-facing, business use oriented software) is developed and put to use with software writ large, then no I don't think so

satvikpendem 11 hours ago||

Sure, I'm just talking about 90% of software which is basic CRUD, not complex systems or microcontroller programming. In that case it's likely that just a PM could build something with LLMs.

sarchertech 4 hours ago||

For basic CRUD we’ve had no code solutions that PMs could have been using for decades.

The truth of the matter is that software starts as basic CRUD and then given time and users evolves into its own special snowflake. Every single system given enough time and users will become a “complex system”.

newobj 12 hours ago||||

I literally can’t tell if this comment is a joke or not.

satvikpendem 11 hours ago||

The last sentence was party facetious sure but the first paragraph is not, I have seen ticket quality go up quite a bit from a few years ago.

well_ackshually 13 hours ago|||

> Honestly, with the first step, it seems the PMs are already halfway there to implementation of the feature so I wonder if in the future they'll just do everything themselves

Yes please, I've seen the vibecoded slop PMs put out every day because software engineering is simply not a skill they have, and I'd love to make a LOT of money fixing their crap once it dies in production <3

satvikpendem 11 hours ago|||

I already do the latter, not very difficult to get into. Good consulting money.

wijej 13 hours ago|||

I’m a former PM who’s now a founder and all the engineers I worked with loved me.

I can tell you right now most pm’s are absolutely useless and glorified project managers who don’t know how to think and get in the way - and don’t know how to enable engineers to be more productive.

shalmanese 14 hours ago|||

> I think when LLMs first came out people thought they could just say something like, "Make a Facebook clone". But now we're realizing we need to be more exact with our requirements and define things better. That has always been the bottle neck in software.

This was substantially predicted by Fred Brooks in 1986 in the classic No Silver Bullets [1] essay under the sections "Expert Systems" and "Automatic Programming".

In it, he lays out the core features of vibe coding and exactly the experience we are having now with it: Initial success in a few carefully chosen domains and then a reasonable but not ground breaking increase in productivity as it expands outside of those domains.

[1] https://worrydream.com/refs/Brooks_1986_-_No_Silver_Bullet.p...

steveBK123 14 hours ago|||

It's interesting how predictable some of this is.

The LLMs turn out fully formed clones of stuff for which there exists copious amounts of code openly searchable on the web doing the exact same thing.

LLMs require developer-like specification, task/subtask breakdown and detail where such example code already exists.

As a professional prior to LLMs, how many problems that you work on have many existing free solutions but you neglected to use that code and decided to spend days doing it yourself?

bonesss 12 hours ago|||

Well put, and same challenge to a lot of these demos & LoC numbers: if you were a pro prior to LLMs, how many of these demos could you fully recreate if you ignored copyright?

I’ve often reimplemented things at work that exist elsewhere. If I could just copy & paste whole solutions from GitHub and change the branding/naming slightly, I could make curl in an afternoon.

juvoly 13 hours ago||||

So true.

I can only think of hobby projects, like writing yet another emulator, expression parser or media processor in a new language I'm trying to master.

In a professional setting, you would always diligently explore libraries and only implement your own if there is no suitable alternative.

pton_xd 13 hours ago|||

> how many problems that you work on have many existing free solutions but you neglected to use that code and decided to spend days doing it yourself?

Only when the existing free solutions are licensed with something like GPL. Now I can just say, write me a C webserver library similar to mongoose and I get the functionality without the license burden.

repelsteeltje 13 hours ago|||

You might as well have ignored or removed the GPL notice. Running it through the LLM laundering gets you a "fork" of unknown origin, questionable quality. You're still potentially open to supply chain issues but the chain is obfuscated.

And you now own full responsibility for maintenance.

grepfru_it 12 hours ago||

I just vibe coded a socks proxy because existing ones were too thick. And let me tell you, you are absolutely right. Go libraries I’ve never heard of, new implementations that has not been tested.. I think the word for this is YOLO

juvoly 13 hours ago|||

Indeed, no license burden but you get a maintenance burden instead.

pton_xd 13 hours ago||

Well I'd get that either way if I write it myself.

Also I was joking, I'd never do that; feels gross. But I suppose it is a legitimate "productive" use of AI.

pjc50 13 hours ago|||

"We've invented the silver bullet from the book 'No Silver Bullets'"

bonesss 12 hours ago||

I read that as a programmer and, lol, you’re right.

I read how that’ll read to VCs coming from Altman and Musk and, ow, the entire stock market just made sense for a second.

stelonix 13 hours ago|||

You're completely right and I thought this would be obvious. I never prompted anything remotely closely to "make a facebook clone". Instead, I make an explanation of how it should work. To give you an example:

  I need a python script that
  
  1) reads /etc/hosts
  2) find values of specific configured hosts (read from a .conf which) eg server1, localhost, etc
  3) it'll assign a name to those configs eg if the .conf has
  
  [Env1]
  192.168.0.1 production-read
  192.168.0.2 production-write
  192.168.0.27 amqp
  
  [Env2]
  192.168.0.101 production-read
  192.168.0.201 production-write
  192.168.1.127 amqp
  
  Basically format:
  
  [CONFIG_NAME]
  <ip> <hostname>
  
  Like an usual hosts file
  
  4) And each of those will be stored in memory
  5) if in /etc/hosts it matches one of those, it sets the "current env" as the configname
  
  5) It'll create an icon on the top-right of ubuntu 22 default gnome with
  6) that icon could be the text of the current config name or if nothing matches, "custom" text would show
  7) When the user clicks the "tray"/appindicator(or whatever gnome is calling them) it'll list the config names in a   simple gtk/gnome
  8) When the user clicks one config, we create a backup of /etc/hosts in ~/.config/backups/ named   hosts-%UNIX_TIMESTAMP%
  9) we then apply it to hosts file (find only the line with the hostnames to change and modify only those)

And that one-shotted a simple gnome app indicator env switcher. Had to fix a few lines here and there but it mostly just worked. If you give the proper spec to the LLM, it'll do it right. You can even fake a DSL to describe what you want and it'll figure it out.

Mossy9 12 hours ago|||

Juxt's Allium https://juxt.github.io/allium/ is an interesting entry in this 'pseudo DSL' space to define and store system specifications and requirements. I think it's likely that this sort of 'persistent specifications to help bots work correctly' will be a good approach when things finally cool down a bit.

skydhash 12 hours ago||||

That's the kind of stuff where you would write a few lines of shell script or perl and not bother with the whole GTK stuff. Because GTK would be accidental complexity to the task (unless you used something like zenity).

This is one of the reasons I like the OpenBSD and suckless projects. There are solutions that are technically correct, but are overengineered.

stelonix 11 hours ago||

Well I would never write shell because I loathe it's grammar/syntax. I enjoy GUIs and am a heavy mouse user, so the GTK part isn't really an "accidental complexity" but a must have for me. If a LLM can one-shot all the GTK boilerplate it's a win.

That's (as shown in my sample prompt) one great thing I've been using LLMs for: making GUIs for arcane Linux-based OS/userland settings that I have no interest in doing "sudo gedit yadda yadda" or learning man pages for. It's been 30+ years, we deserve a better desktop experience.

I've used suckless packages in the past, but it feels to me too close the GNOME/Apple way of giving zero settings and having opinionated defaults whose opinions do not ring well for me. I have zero desire to change my shortcuts/hotkeys to something random devs chose based on their past computer experience, mostly unix-based. Muscle memory > *.

skydhash 10 hours ago||

And that’s fine.

I was pointing out that a simpler solution exists. I prefer simple solutions, because I want to test whatever idea I have in real world situation first before I go for a more complete one. Kinda like doodling before committing to do a sketch (or spend weeks doing a painting).

> It's been 30+ years, we deserve a better desktop experience

That desktop experience would need to be like smalltalk (where it’s trivial to modify the gui). The nice power of Unix is having the userland being actually a userland. Meaning you can design a system for your workflow and let the computer take care of that. Current desktop environment doesn’t allows for that kind of flexibility.

Also it’s the nature of unix that makes such basic utilities possible (and building them with raw xlib or tcl is easier than gtk). Imagine doing the same on macOS or Windows where everything is behind an opaque database where some other process fancies itself as its owner.

Izkata 9 hours ago||

There's also a pattern based on the simple solution that used to be more common: One command-line program for updating and querying the current state, and a second GUI one that just acts as a dumb interface for the first one. Even aside from separation-of-concerns purity, there are two more practical benefits: this gives you scriptability (say, automatically choosing an environment on startup) as well as easier support for multiple desktop environments (two different dumb GUI frontends for the actual complexity in the command-line backend, or updating the GUI because of a change in the APIs without worrying about breaking the important logic).

LtWorf 12 hours ago||||

That's easily more spec than script.

popcorncowboy 12 hours ago|||

> You're completely right

I mean, no comment

rubyfan 14 hours ago|||

We now have product owners trying to farm out their work to an LLM. The process didn’t work before because the person writing the requirements either put out vague requirements or bad requirements because they didn’t understand the business intent (or were careless).

LLMs just take the same vague or poor requirements and make them look believable until you dig in to them.

mpyne 12 hours ago|||

> The process didn’t work before because the person writing the requirements either put out vague requirements or bad requirements because they didn’t understand the business intent (or were careless).

You make it sound like writing good requirements is easy.

If it were easy we wouldn't need all these concepts around PMF, product pivots and the like. And even before that was Peter Naur's paper "Programming as Theory Building" [1].

If you truly understand the problem you're solving with software then requirements can be easy. But usually we don't, not right away, and so we have to build up our understanding of the problem first in order to solve it.

Even then, the problem we solve may not have been the problem paying users will have, so you can have "good requirements" and still have a bad business, or even the opposite where you somehow build a working business despite bad requirements, because you hit upon a customer's need quite by mistake.

Nothing about any of this precludes LLMs being helpful, though nothing guarantees LLMs will be helpful either.

[1]: https://cekrem.github.io/posts/programming-as-theory-buildin...

rubyfan 5 hours ago||

> You make it sound like writing good requirements is easy.

I am certain I didn’t say that. To be a good product owner one needs skill, care and understanding of the business intent. If you know the business intent but lack the skill to express it as a useful requirement then it’s insufficient; if you have the skill but lack understanding or ability to understand the business intent then it’s insufficient; if you have the skill and understand the business intent but you are careless in your work then it’ll be insufficient too. If the problem space is emergent then having all three might not be good enough either.

It’s certainly true that good engineering teams can deeply understand the problem space enough to get to a business outcome without requirement documents.

I just wouldn’t bet that LLMs are going to make any of these realities any better, they might exacerbate those issues.

mpyne 1 hour ago||

> I just wouldn’t bet that LLMs are going to make any of these realities any better, they might exacerbate those issues.

Yes, that's certainly a fair assessment, especially the more it convinces software developers they can talk to the LLM rather than talking to users.

steveBK123 14 hours ago|||

Plausible requirement generators as inputs to plausible code generators.. what could go wrong!

eastbound 14 hours ago||

It’s a giant tragedy of the commons. I’ve fired remote people who pretended to work, knowing that I wouldn’t hire remote workers ever again after AI.

niek_pas 13 hours ago|||

What's even worse is that when dealing with human software teams, a vague requirement will (at least in a well-run org) receive demands for further specification. "What do you mean by 'get data'?", etc.

An LLM will just say, "Sure! Here's the fully implemented code that gets the data and give it to the user. " and be done with it.

smokel 13 hours ago|||

ChatGPT 5.5 responds:

> What data should I retrieve, and where should I get it from? Please specify at least: ...

And it then goes on to ask just exactly what is necessary, being all constructive about it.

airstrike 13 hours ago||

You're both right. The parent was a toy example, and if asked literally to an LLM, it will definitely ask for more information. Yes, it's important to be accurate but I don't think that applies here.

But the point still stands: in most contexts, the LLM will fill in the blanks with what it deems appropriate like an overconfident intern at best and a bull in a China shop at worst.

vidarh 13 hours ago||||

When the cycles are short enough, though, that is to some degree the right thing. That is, it's the right thing for things the users can then immediately see and give feedback on, because it lets them give feedback on something tangible.

It's the wrong thing for important things under the hood (like durability and security requirements) that are not tangible to them.

resters 13 hours ago||||

Just as poorly designed code can still compile. This is operator error, not a failure of the technology.

pydry 13 hours ago|||

IME you give it very precise specifications and it still fucks it up.

When we talk about "the" bottleneck being specs it just isnt the case that it's the only thing LLMs do poorly. Theyre really bad at a lot of stuff in the SDLC.

They're also good at providing results which are bad but look ok if you either dont look too closely or dont know what you're looking for.

cryo32 14 hours ago|||

It's worse. Vague requirements still only power vague interpretations of the problem. Even if you provide good requirements, you still only have vague interpretations at your fingertips. The promise is that such things won't be a problem in the future, which is obviously not materialising.

"Make a facebook clone" is the vague human promise to the end user. The reality is that it leads to so many assumptions which are insurmountable due to the vague interpretation so you have to change your requirements in the end to claim success.

Thus everything turns into a mediocre compromise. There is no exceptional outcome, which is what makes a marketable product. There are just corpses everywhere.

You need something better to both define requirements and implement them than this technology.

wijej 13 hours ago||

Can someone pull that Steve jobs quote out re. The craftsmanship between a great idea and great product?

Anyone who thought that gap could be shrunk substantially lives in delululand.

Hence why we haven’t seen this explosion of ‘really great’ products come out.

Many will continue to parrot ‘bro but the models changed I swear’. I’m sure they did. But you’re missing the damn point.

faangguyindia 13 hours ago|||

product people love LLM because it doesn't ask

"what does X means? how will it work?"

while a programmer will ask, about all cases.

bonesss 12 hours ago|||

Did everyone forget about outsourcing and how outsourcing works?

The dudes in Eastern-Wherever not asking what something means is the scary part. You only find out at the end how deeply confused everyone was when making the thing. You can fix it with attention and management, but then only some projects sometimes are profitably outsourced and you still need competency.

gib444 13 hours ago|||

Do they have a point?

Can't good marketing teams, backed up by World Class Product people, sell anything we build, more or less?

</devil's advocate>

layer8 12 hours ago|||

Even if that were the case, I wouldn’t want to spend my working life building software poorly fit for the purpose, that nevertheless sells due to marketing.

whstl 12 hours ago|||

This was already a reality for a years.

In several companies I have seen product managers joining teams and failing to even have minor requirement ready for months during “onboarding” of the PM. And then code being ready but taking months to release because DevOps is busy or QA can’t find time.

The pace of release of software has been disconnected from the coding part for the longest time, and we have been quiet about it.

tedd4u 11 hours ago||

The solution I've seen work is have engineers and designers that can take much of the detailed spec writing on, and have the PMs spend time with users/prospective users, partners, etc, understanding the market and users better. When you pull PMs in to all the details, often they turn into project managers, shuffling bug tickets around etc, taking time away from owning the user and the problem and shifting them too much to the solution side. Have a lead engineer own much / most of that. Every org / product is different of course.

whstl 11 hours ago||

I agree with you. The healthier organizations work in the way you mention.

mike_hock 12 hours ago|||

> But now we're realizing we need to be more exact with our requirements and define things better.

That's why we write programs in programming languages and not English. Because they are much more efficient at giving precise instructions than natural language.

nnoremap 12 hours ago||

But horribly token inefficient.

steveBK123 14 hours ago|||

> When I was working we used to get requirements that literally said things like, "Get data and give it to the user". No definition of what data is, where its stored, or in what format to return it. We would then spend a significant amount of time with the product person trying to figure out what they really wanted.

This is a big HN LLM discussion divide. I am in the same no-specs work background camp, and so the idea that the humans who input that into dev teams are suddenly going to get anything out of an LLM if they directly input the same is laughable. In my career most orgs there has been no product person and we just talked directly to end users.

For that kind of org, it will accelerate some parts of the SWEs job at different multipliers, but all the non-dev work to get there with discussions, discovery, iteration, rework, etc remains.

If the input to your work is a 20 page specification document to accompany multi-paragraph Jira tickets with embedded acceptance criteria / test cases / etc, then yes there is a danger the person creating that input just feed it into an LLM.

et1337 14 hours ago||

I’ve never understood engineers who complain about vague specs… if the spec was complete, it would be code and the job would be done already! Getting a 20 page spec delivered from upon high and mechanically translating it to code without any chance to send feedback up the chain sounds like… a compiler.

steveBK123 14 hours ago|||

Yes, I don't think a job where I am programmed by a product manager would be terribly interesting. I would move on to be the product manager if I found myself in such a role.

Probably why I haven't ended up in any.

hirako2000 13 hours ago||||

The demands are for functional requirements. Plenty to translate on the non functional side of things.

skydhash 13 hours ago|||

In my experience, the complaints are not about the specs and their vagueness. It's more about the political game to get them detailed. If you've not encountered the kind of organizational issues where getting an answer is like pulling teeth, you're kind of lucky.

et1337 13 hours ago||

Oh no, I’ve definitely experienced that, it’s terrible. But that situation makes me wish for more agency (for example, talking to customers directly), whereas it seems to make other engineers wish for less agency (please hand me a complete spec and I will mindlessly translate it to code). That’s what I don’t understand.

pirates 13 hours ago||

some of us couldn’t give a rat’s ass about the customer. One of our customers charges people for paying their own bills via certain methods, which is completely bogus and I remind everyone loudly all the time that they do this. Everyone agrees that this customer sucks to work with, and the less time spent with them the better. The people from the customer’s end suck, they’re not technical, they have in-fighting with their own teams during calls, have decades long errors with their integration that they have never fixed…the list goes on. For this customer and a few others, please give me a spec that I can implement, shove it back across the aisle, and forget about. The absolute last thing I want is to have to talk to them more.

rockbruno 11 hours ago|||

Realizing? Will be very happy if that is the case, but in my view all big company execs are still balls deep into the notion that you will be able to just ask it for the facebook clone and everything sucks as a result

yoyohello13 10 hours ago|||

Even purely from an information theory perspective it was obvious “make me a Facebook clone” was not going to work. The more you compress the information prompt the more detail you lose.

latexr 13 hours ago|||

Cue that commitstrip comic from 2016.

https://web.archive.org/web/20161211074810/http://www.commit...

startages 12 hours ago|||

Now it's worst, you get an PDF export of a long ChatGPT chat history with one sentence "Can you give an estimation for this?"

derefr 11 hours ago|||

The annoying thing is that giving an LLM vague instructions like "make a Facebook clone" does work... in certain limited cases. Those being mostly the exact things a not-very-creative "ideas person" would think to try first. Which gave the "ideas people" totally the wrong idea about what these things can do.

These same "ideas people" have been contracting human software developers to "make them a Facebook clone" (and other requests of similar quality) for decades now.

And every so often, the result of one of those requests would end up out there on the internet; most recently on Github. (Which is, once there's enough of them laying about, already enough to allow a coding-agent LLM trained on Github sources to spew out a gestalt reconstruction of these attempts. For better or worse.)

But for the most common of these harebrained ideas (both social-media-feed websites and e-commerce marketplace websites fit here), entire frameworks or "engines" have also been developed to make shipping one of these derivative projects as easy as shipping a Wordpress.org site. You don't rewrite the code; you just use the engine.

And so, if you ask an LLM to build you Facebook, it won't build you Facebook from scratch. It'll just pull in one of those frameworks.

And if you're an "ideas person", you'll think the LLM just did something magical. You won't necessarily understand what a library ecosystem even is; you won't realize the LLM didn't just generate all the code that powers the site itself, spitting out something perfectly functional after just a minute.

Foobar8568 14 hours ago|||

We arrived to that state today with Codex and Claude Code. I really don't know what people are doing wrong?

paulddraper 12 hours ago||

So the agent needs a “plan” mode where it works with the user and asks questions to define the ask.

ddosmax556 12 hours ago||

This article assumes that AI only has an impact on the development phase which is certainly not true. It can speed up every part of the step. Including ideation, legal, documentation, development, and deployment.

Ideation: Throw ideas back & forth, cross reference with knowledge bases, generate design documents. Documentation: Generate large parts of docs. Development: Clear. Deployment: Generate deployment manifests, tooling around testing, knowledge around cloud platforms.

Every single step can be done better & faster with AI. Not all of them, but a lot.

Even development. Yes some part of your job involves understanding the problem better than anyone & making solutions. But some parts are also purely chore. If you know you keed a button doing X, then designing that button, placing it, figuring out edge cases with hover & press states, connecting to the backend etc - this is chore that can be skipped. Same principle applies to almost all steps.

RaftPeople 12 hours ago||

I tend to agree with the article.

A typical example of trying to add a new significant capability involves many meetings (days, weeks, months, etc. )with the business to understand how their work flows between systems X, Y and Z as well as all of the significant exceptions (e.g. we handle subset A this way and subset B that way, but for the final step we blend those groups together, except for subset C which requires special process 97).

Then with that understanding comes the system solutioning across multiple systems that can be a blend of internal system or vendor's system, each with different levels of ability to customize, which pushes the shape of the final solution in different directions.

There is certainly value in speeding up coding, but it's just one piece of the puzzle and today LLM's can't help with gathering the domain information and defining a solution.

physicsguy 8 hours ago|||

What I've seen in an AI-forward looking environment is that it's much more common for PM/POs to be knocking up at least a UI prototype now, and experimentation is happening often even before writing the tickets. Similarly when devs are proposing something they often are coming with a couple of prototypes already implemented. Both of those mean decisions are coming a lot quicker.

wise0wl 12 hours ago|||

I've seen proposals for Product Managers to define those conditions themselves by speaking with the LLM. A continuing architectural diagram is constructed and graph is updated until all cases are covered and then the LLM writes the code, writes the validations, pushes to CI environments, runs tests, schedules prod deploy (by looking at company event schedule), gets CAB approval, deploys code, tests in prod, and fixes regressions.

I'm not saying this is the correct thing, but companies are implementing it and it is "working". I don't think keeping our head in the sand is helping.

RaftPeople 10 hours ago|||

> I've seen proposals for Product Managers to define those conditions themselves by speaking with the LLM.

But the LLM is not aware of how the business works and why, so someone needs to work with the business to extract the information. Typically it's not well documented.

ijustlovemath 12 hours ago|||

is it working though? The main outcome we've seen with companies that drink the AI Kool aid en masse is buggy unstable systems. clearly there's a level of rigor that's being missed for ship velocity

gravity2060 12 hours ago|||

All of the above points align with our organization’s experience. But there is one more thing happening as well: we have more people in more roles able to create software solutions for issues that used to be brute forced via physical processes. (We are a small manufacturing business.) While these aren’t big giant enterprise projects that require deep swe experience, they are simple software tools that are improving process and productivity everywhere. It is pretty amazing what happens when your head of shipping can build a bespoke tool to solve a problem that previously they dealt with through burning through a lot of labor hours.

keithnz 3 hours ago|||

One of my beliefs about AI, for small / medium sized companies it allows them massive speed ups and generally increases their capability (I'm also in this space), existing employees of all types essentially get massive speed boosts / opens pathways not available before. For big companies, they are likely to have a bunch of problems due to size, communication pathways, management structures, decision making structures, etc.

Avicebron 12 hours ago||||

I would be really interested in the details of these kind of tools that are improving processes and productivity.

Are they reasonably documented/audited/put into any sort of version control like a lot of internal tooling? Or are they the kind of the thing that gets whacked together on the fly in a "move spreadsheet data from A to B", "I want a list of people's schedules with custom highlighting" kind of things.

Not doubting your productivity increase, I'm just curious how people quantify that when they say it.

xeromal 12 hours ago||||

One of our BAs created a site that tests the effectiveness of copy / layout adjustments. I don't even know exactly what that's called but he's able to do statistical analysis much faster on what works and what doesn't. It's really cool to watch him thrive and I feel like some of the thinkers that were not devs are going to find themselves to be one but in their specific domain in a few years

bjelkeman-again 12 hours ago||||

Yes. In the same way that spreadsheets are the dev tools for non-devs, LLMs could step into that role, but with much more powerful end result. With the caveat that in the same way you can create a powerful foot-gun with a spreadsheet you can probably create a foot-cannon with an LLM.

yieldcrv 12 hours ago|||

yeah the Coinbase CEO gleefully pointed that out as well and now the market thinks they are totally incompetent every time some UX quirk is found

looks like orgs have to have engineers on for optics. like having a legal staff with no lawyers, or a cybersecurity staff with no IT or certified people. Software has famously not needed state licenses or industry certification, but maybe thats a direction to consider to give utility to company optics.

monkeydust 12 hours ago|||

The article pretty much plays out whats happening in our place, heavy use of AI in software development but we dont see us shipping faster, about same or perhaps slower (for other reasons). Its a weird feeling as were waiting for this utopia to kick-in but its not and were cant fully put our fingers on it.

solenoid0937 4 hours ago||

The article and the AI skepticism crowd on HN read like the blind leading the blind to me.

I'm at a FAANG. My org is moving much more quickly, maybe between 3-10x more quickly than we were pre-AI. We aren't seeing a spike in reliability issues. Things just get done faster. An org as large as mine has no right to move as fast as it does.

echelon 12 hours ago|||

The onus isn't on people using AI effectively to prove it to others.

In fact, these disagreements and disbeliefs create opportunities and salients in the market.

obsidianbases1 12 hours ago|||

Indeed. I suspect most effective AI users are quietly making real progress toward their objectives.

Anecdotally, I see a lot of problems/solutions content about AI that doesn't reflect at all the challenges I face. But trying to tell people that there are other ways of doing things, especially when it conflicts with token-maxxing, is a lost cause

ddosmax556 8 hours ago|||

I know and I agree. It sounds incredibly arrogant but it's frankly is a bid sad to see how much HN is lagging behind AI adaption. It's been 90% noise over the last 3-6 months about problems that aren't truly problems if you really look hard at what AI is capable to do already today. It's mostly ppl & process problems. I could post a comment like the one above below almost every article on AI. But it is what it is. It's an opportunity for anyone who doesn't bite into the cynical tone here for sure.

solenoid0937 4 hours ago||

The HN AI skeptics are just bizarre to me. They are insisting to us that, no, the productivity gains we're experiencing every day, simply don't exist!

It's not that they're using the tool wrong, it's that the tool just isn't capable of what we see before our own eyes! I guess our eyes and ears are simply lying to us?

And then they ask for how we are managing to make things move faster. When you refuse to breach NDA and give up your competitive advantage on HN, this somehow confirms their belief that AI is useless.

pkoird 12 hours ago||

Precisely. People don't realize that it's all numbers. Given average IQ of people involved in a project is 140, an AI with an IQ of 150 can replicate each and every such individuals in the pipeline. People saying AI can't do this or AI can't do that should come to terms with the fact that this IQ gap is monotonously increasing.

fn-mote 11 hours ago|||

This is bizarre to me on so many fronts.

1: When was the last time you worked on a project where you thought the average IQ was 140? I don’t even think I have worked on a project where the maximum IQ was 140.

2: Who thinks the IQ of people on the project determines its success? There’s so much more to it than just “high capability team members” (to give IQ a generous interpretation).

3: (math joke) A sequence like (AI IQ - Human IQ) can be negative and monotonicly increasing and still never reach 0.

davebren 5 hours ago||||

Pattern matching against millions of IQ test questions from a training set in order to score 150 on an IQ test doesn't give you an intelligence equivalent to 150.

OccamsMirror 12 hours ago||||

Funnily enough, though, I think it makes dumb people dumber.

icedchai 12 hours ago||

I agree. Inexperienced people (not necessarily "dumb") are likely to accept everything at face value, not apply critical thinking skills, and not even check the AI generated output.

tovej 12 hours ago||||

An AI does not have an IQ.

tptacek 4 hours ago||

Sure it does. IQ is simply a measure of performance on an IQ test. A simple Python loop around Google search in 2012 had an IQ.

tonyedgecombe 12 hours ago|||

Monotonically although I do find the discourse on AI rather monotonous.

kj4211cash 15 hours ago||

On the one hand, this is a clean post that explains exactly what a lot of us have been thinking and seeing on the job at large organizations doing tech work. Dear Author, I agree with you 110% and want everybody else to come to understand what you have written.

On the other hand, it feels like we've been over this tens of times recently, on HN specifically and IRL at work. Another blog post isn't going to convince leaders that this is how the world works when they are socially and financially incentivized to pretend like AI really will speed things up. So now I just wait for their AI projects to fail or go as slowly as previous projects and hope they learn something.

yakattak 14 hours ago||

Sadly I think you’re right. I even shy away from sharing these types of posts at work because it feels like anything that doesn’t mesh with the status quo isn’t received well.

prh8 10 hours ago||

Same here. Anyone being even hesitant about AI is viewed negatively by management

solenoid0937 2 hours ago||

This is the one case where management is right, they had to drag my (FAANG) org kicking and screaming into the AI future and it was worth it. We are moving 3-10x faster (team dependent).

Now the engineers I know that had the same skeptical tone as OP are the ones singing its praises and doing cool shit with it.

humanymous 8 hours ago|||

Every time these types of posts are discussed at work, the point is always that there's more risk of falling behind (more like FOMO) at pace if others are able to launch or bring new features faster

utopiah 15 hours ago|||

I disagree, I think the visuals, Gantt charts, are precisely the kind of "PM speak" that can be understood. Sure it won't solve anything as long as C-suite and investors do innovation signaling but that itself can only last so long.

hirako2000 13 hours ago||

I think the point is that clarity has been published many times.

Humanity knows how to solve starvation. Clear routes were laid out long ago. The work is in adoption.

layer8 12 hours ago||

The alternative viewpoint is that if there weren’t people who continue to try to advocate for a better world, the world we’d live in would be even worse.

cmrdporcupine 15 hours ago||

Yep. I have the luxury of having my mortgage paid off and being able to be a bit picky about my work for a little bit.

So I am spending my days gardening and obsessively working on personal coding projects with these agentic tools. Y'know, building a high performance OLTP database from scratch, and a whole new logic relational persistent programming environment, a synthesizer based on some funky math, an FPGA soft processor. Y'know, normal things normal people do.

So I know what these tools are capable of in a single person's hands. They're amazing.

But I hear the stories from my friends employed at companies setting minimum token quotas or having leaderboards of people who are "star AI coders" telling people "not to do code reviews" and "stop doing any coding by hand" and I shake my head.

I dipped my toes into some contract work in the winter and it was fine but it mostly degraded into dueling LLMs on code reviews while the founder vibe coded an entire new project every weekend.

These tools suck for team work or any real team software engineering work.

I'll just let this shake out and sit out until the industry figures it out. The only places that are going to be sane to work at are places with older wiser people on staff who know how to say "slow down!" and get away with it.

In the meantime, quantities of cut rhubarb $5 a bunch in Hamilton, Ontario area for sale. Also asparagus. Lots and lots of asparagus.

jcgrillo 13 hours ago||

Yeah I think moving forward one of the questions I'll be asking companies I interview with is "what does your seniority distribution look like and how do you intend to maintain it?"

somenameforme 14 hours ago||

I think there's an interesting dichotomy. I find that for things I'm already capable at, LLMs are relatively inconsequential. But for things I'm no good at, it's a huge game changer. For a large company, that's going to be able to hire out most needed roles for any given project, this means the overall effect is going to be relatively inconsequential. At best, they may be able to cut down on labor costs by having one guy do a mediocre job at 5 people's jobs in exchange for a worse product. Short-term gains for long-term costs, wcgw?

But for a small studio, or independent developer, LLMs are a big game changer. Being able to do a mediocre job at 5 people's jobs is a huge leap over trying to get by without those jobs - relying on third party assets or other sorts of content, or even worse - doing a really awful job of trying to improv those jobs. See the UI of basically any program ever that was clearly laid out by a programmer and not a designer. Or there's the whole trying to rip off stuff from dribbble, but lacking the skills to do so. Whereas with AI, you can suddenly competently rip off everything and everybody - it's basically their entire MO.

argee 13 hours ago|

> I find that for things I'm already capable at, LLMs are relatively inconsequential. But for things I'm no good at, it's a huge game changer.

What are the chances that this is the Gell-Mann amnesia effect? Sounds like the textbook definition of it.

Personally, I find the exact opposite to be true. LLMs only help me when I already know exactly what I'm doing.

xeromal 12 hours ago|||

I can give an anecdote. I'm a backend engineer for a service that I would consider pretty high horsepower. We get about 30k sign ups and trillions of events a day. I haven't touched the front end with a 10 foot pole since college.

I got the opportunity to rewrite our aging login page just as a fun experiment. I sat down with one of our analysts and we just went to town in a zoom trying out stuff with claude until we made something pretty sweet. Ran it through all our systems for accessibility, performance, etc and it came out clean. Made a PR and fired up a test that day in production. I haven't written a lick of our front end framework ever in my entire life and we were able to build something that has had a marked improvement in our user engagement in a day.

alt227 11 hours ago||

> a marked improvement in our user engagement in a day.

Do you have any idea what has caused this engagement improvement and indeed do you actually have any metrics or is it hearsay?

It is much easier to knock something up in a day as you have done, but often the reason manual things take longer is they are based on actual testing and research which takes longer than a day however you do it. The manual way gives you much more data on the hows and whys, and will inform you much more in the future when you need to change again instead of just 'ai did it last time, lets use it again!'

xeromal 11 hours ago||

No, we did a actual test using our existing testing framework. We have shitloads of metrics to know when a user gets stuck, when they give up, which login path they took, etc.

This wasn't a half assed test but a legitimate effort to improve something that we never prioritized

We had a legitimate 25% reduction in users giving up logging in in a system that has millions of users.

We ran a 50-50 AB test for several weeks to confirm the data and then turned it on completely

edit: If you haven't already read my post, I'd also like to say that the benefit AI gives us is that I worked on something I never get to work on, the analyst got to try a hunch he always had, and we got to see it go live in a day. If it didn't' work out, we were out a day of work which beats the few weeks of an effort prior to AI that we would spend on something just to find out it didn't work.

gwern 8 hours ago||

This seems consistent with OP. You had a feature where most of his Gantt chart is, in effect, already done: you had a clear problem with a clear well thought out design/solution (with associated documentation) in mind, you had a well setup analytics process for deployment and followup... you really had everything except that big fat chunk in the middle labeled 'coding'. So in your anecdote, an agentic coding LLM really could deliver a huge speedup by doing the remaining 10% or whatever of the work.

This is why LLMs are really great 'knocking off the todo/wishlist' of things you always meant to do. The problem, as far as broader discussions of 'productivity multipliers' or 'total factor productivity' go is that there's a certain perverse diminishing returns to such wishlist items (if each item was all that important, why didn't it get done before?), they generally only apply to a small part of a large complicated whole (what % of your ecosystem/business/community as a whole is the login page, as pleasing and profitable as that fix is relative to the investment? Probably not a big %), and they are also finite (what happens when you have worked through your backlog of lowhanging fruit?).

Eiriksmal 7 hours ago||

I ask myself these same questions every workday. Are you cooking any new articles on this topic, Gwern? Reading your (thoroughly researched) thoughts often helps me clarify my own.

simondotau 13 hours ago|||

Just because one isn’t good at a thing doesn’t preclude one from being a sufficiently passable judge of a thing.

To wit, the answer pre-AI was to hire an expert on that thing, and you would then critically assess their work product, despite being unable to build it yourself.

argee 13 hours ago||

True, but if you hire a generalist and they are consistently under-performing specifically in the subject matter where you are an expert, it may behoove you to take the rest of their work with a grain of salt as well.

siliconc0w 10 hours ago||

People don't really understand that non-trivial software development isn't even 50% coding. The coding step is generally the 'easiest' part and given to Junior developers. In a large org most product changes span multiple systems and human operations. Seniors and even mid-level generally spend most of their figuring out how to shape the local priorities into a new arrangement of the existing cybernetic entity and then getting buy-in on that new vision given these other teams have their own priorities.

This naturally involves a lot of tradeoffs and politics - senior engineers know to avoid adding 'weight' to their airframes and fight hard to avoid adding scope to the systems they're responsible for or divergence from their intended direction of travel. So compromises have to be struck or escalations to management to choose between priorities have to play out.

Maybe AI solves that as well but that is a lot more difficult lift.

fastball 9 hours ago|

LLMs mostly only being code-writers was true a year ago, but it is not true now. Now they are tool-callers, which means a coding agent can effectively: run lints/typechecks/tests (and fix resulting errors), dig into observability platforms to identify root cause of isses (e.g. on Sentry or similar), run benchmarks to identify slow code / hot paths, keep systems up to date by reading migration docs (and applying them) for new majors of consumed libs, etc.

So sure, if you have none of these things set up to back-pressure agents and help them better understand the system, then they will just be dumb LLM code writers. But you can definitely go a lot further than that with the improvements that are rapidly happening to models and harnesses.

deepsun 26 minutes ago||

Interestingly, there is already role for that called "business analyst". Their job is exactly what's written in here. But surprisingly, I saw them only at SWE vendors (aka head shops), probably because customers don't understand what they really want, and BAs translate their vague requirements for SWEs to work. I would say we are all gradually becoming BA/UX, even if we don't want to call ourselves that.

shalmanese 14 hours ago||

This is all substantially correct and gives us hints as to where to focus for AI to make the processes go faster.

Eg: I had a product manager say to me that he envisions a future where any meeting with stakeholders that does not result in an interactive prototype by the end of the meeting would be considered a failure. This feels directionally correct to me.

The other thing I expect to see is Vibecoding being the "Excel 2.0" where it allows significant self-serve of building interactive apps that's engaged in a continual war with IT to turn them into something with better security guarantees, proper access control & logging, scalability, change management etc.

But the larger historical point here is that every revolutionary transition produces, in the early stages, "Steam Horses". The invention of the steam engine had people imagining that the future of transportation would involve horse shaped objects, powered by steam, pulling along conventional carts. It wasn't until later developments that we understood the function of transportation as divorced from the form.

I started talking about Steam Horses originally in the context of MOOCs, which was a classic Steam Horse idea.

skydhash 11 hours ago||

> he envisions a future where any meeting with stakeholders that does not result in an interactive prototype by the end of the meeting would be considered a failure.

Just learn something like balsamiq. You don't need code to build out a prototype. Just like you don't need actors and a camera when a few sketches can capture a scene.

p2detar 15 hours ago||

> Yes, AI can generate code quickly (whether that’s a good thing is open for debate), but that doesn’t mean it’s generating the correct code.

No, the code is actually almost always correct. The way it’s added is probably not what you’re going to like, if you know your code base well enough. You know there’s some ceremony about where things are added, how they are named, how much comments you’d like to add and where exactly. Stuff like that seems to irritate people like me when not being done right by the agent, and it seems to fail even if it’s in the AGENTS.md.

> If you were to give human developers the same amount of feature/scope documentation you would also see your productivity skyrocket.

Almost 2 decades in IT and I absolutely do not believe this can ever happen. And if it does, it’s so rare, it’s not even worth talking about it.

lmm 3 hours ago||

> Almost 2 decades in IT and I absolutely do not believe this can ever happen. And if it does, it’s so rare, it’s not even worth talking about it.

It happens completely routinely, IME. Just compare how much effort it takes to clone a system that's already written versus making that system from scratch.

nijave 13 hours ago||

>No, the code is actually almost always correct

That's not my experience, especially when the inputs are bugs or performance issues. It frequently hallucinates and misdiagnosis without a guiding hand. However, it can still RCA and analyze well and improve efficiency if you keep an eye on what it's doing and push it the right direction.

> If you were to give human developers the same amount of feature/scope documentation you would also see your productivity skyrocket.

I think you run into a ceiling how fast a person can digest and analyze the info compared to a machine

BoxedEmpathy 8 hours ago||

What tools are you using? What settings? What process? What's your code review like?

I think this varies a lot. I find with a c++ project I'm working on that the LLM needs a lot of guardrails and guidance, and still gets a lot wrong. But with a vite/js project it often one shots complex and intricate changes in large codebases.

usernametaken29 15 hours ago|

Instead of mandatory AI workshops simply cancel all meetings with more than 3 people and no written agenda. Instead block the meeting time for productive work. That’ll be 2000$ of advisory fees for the insane productivity gains I just unlocked you. You’re welcome

teaearlgraycold 15 hours ago|

If people got paid for telling the truth you’d be rich.

steveBK123 14 hours ago||

Yes, there are MANY in tech/non-tech management that will quietly admit that a lot of this top-down stuff is to create the appearance of motion to appease a higher more tech/AI ignorant authority.

More comments...