The Codex app illustrates the shift left of IDEs and coding GUIs

Posted by straydusk 2 days ago

The Codex app illustrates the shift left of IDEs and coding GUIs(www.benshoemaker.us)

79 points | 199 comments

kace91 2 days ago|

>The people really leading AI coding right now (and I’d put myself near the front, though not all the way there) don’t read code. They manage the things that produce code.

I can’t imagine any other example where people voluntarily move for a black box approach.

Imagine taking a picture on autoshot mode and refusing to look at it. If the client doesn’t like it because it’s too bright, tweak the settings and shoot again, but never look at the output.

What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.

Are these people just handing off the review process to others? Are they unable to read code and hiding it? Why would you handicap yourself this way?

eikenberry 2 days ago||

I think many people are missing the overall meaning of these sorts of posts.. that is they are describing a new type of programmer that will only use agents and never read the underlying code. These vibe/agent coders will use natural(-ish) language to communicate with the agents and wouldn't look at the code anymore than, say, a PHP developer would look at the underlying assembly. It is not the level of abstraction they are working on. There are many use cases where this type of coding will work fine and it will let many people who previously couldn't really take advantage of computers to do so. This is great but in no way will do anything to replace the need for code that requires humans to understand (which, in turn, requires participation in the writing).

jkhdigital 2 days ago|||

Your analogy to PHP developers not reading assembly got me thinking.

Early resistance to high-level (i.e. compiled) languages came from assembly programmers who couldn’t imagine that the compiler could generate code that was just as performant as their hand-crafted product. For a while they were right, but improved compiler design and the relentless performance increases in hardware made it so that even an extra 10-20% boost you might get from perfectly hand-crafted assembly was almost never worth the developer time.

There is an obvious parallel here, but it’s not quite the same. The high-level language is effectively a formal spec for the abstract machine which is faithfully translated by the (hopefully bug-free) compiler. Natural language is not a formal spec for anything, and LLM-based agents are not formally verifiable software. So the tradeoffs involved are not only about developer time vs. performance, but also correctness.

ytoawwhra92 2 days ago|||

For a great many software projects no formal spec exists. The code is the spec, and it gets modified constantly based on user feedback and other requirements that often appear out of nowhere. For many projects, maybe ~80% of the thinking about how the software should work happens after some version of the software exists and is being used to do meaningful work.

Put another way, if you don't know what correct is before you start working then no tradeoff exists.

majormajor 2 days ago||

> Put another way, if you don't know what correct is before you start working then no tradeoff exists.

This goes out the window the first time you get real users, though. Hyrum's Law bites people all the time.

"What sorts of things can you build if you don't have long-term sneaky contracts and dependencies" is a really interesting question and has a HUGE pool of answers that used to be not worth the effort. But it's largely a different pool of software than the ones people get paid for today.

ytoawwhra92 1 day ago||

> This goes out the window the first time you get real users, though.

Not really. Many users are happy for their software to change if it's a genuine improvement. Some users aren't, but you can always fire them.

Certainly there's a scale beyond which this becomes untenable, but it's far higher than "the first time you get real users".

majormajor 1 day ago|||

But that's not what this is about:

> For many projects, maybe ~80% of the thinking about how the software should work happens after some version of the software exists and is being used to do meaningful work.

Some version of the software exists and now that's your spec. If you don't have a formal copy of that and rigorous testing against that spec, you're gonna get mutations that change unintended things, not just improvements.

Users are generally ok with - or at least understanding of - intentional changes, but now people are talking about no-code-reading workflows, where you just let the agents rewrite stuff on the fly to build new things until all the tests pass again. The in-code tests and the expectations/assumptions about the product that your users have are likely wildly different - they always have been, and there's nothing inherent about LLM-generated code or about code test coverage percentages that change this.

just6979 1 day ago|||

"Some users will _accept_ "improvements" IFF it doesn't break their existing use cases."

Fixed that for you.

andai 2 days ago||||

> So the tradeoffs involved are not only about developer time vs. performance, but also correctness.

The "now that producing plausible code is free, verification becomes the bottleneck" people are technically right, of course, but I think they're missing the context that very few projects cared much about correctness to begin with.

The biggest headache I can see right now is just the humans keeping track of all the new code, because it arrives faster than they can digest it.

But I guess "let go of the need to even look at the code" "solves" that problem, for many projects... Strange times!

For example -- someone correct me if I'm wrong -- OpenClaw was itself almost entirely written by AI, and the developer bragged about not reading the code. If anything, in this niche, that actually helped the project's success, rather than harming it.

(In the case of Windows 11 recently.. not so much ;)

majormajor 2 days ago||

> The "now that producing plausible code is free, verification becomes the bottleneck" people are technically right, of course, but I think they're missing the context that very few projects cared much about correctness to begin with.

It's certainly hard to find, in consumer-tech, an example of a product that was displaced in the market by a slower moving competitor due to buggy releases. Infamously, "move fast and break things" has been the rule of the land.

In SaaS and B2B deterministic results becomes much more important. There's still bugs, of course, but showstopper bugs are major business risks. And combinatorial state+logic still makes testing a huge tarpit.

The world didn't spend the last century turning customer service agents and business-process-workers into script-following human-robots for no reason, and big parts of it won't want to reintroduce high levels of randmoness... (That's not even necessarily good for any particular consumer - imagine an insurance company with a "claims agent" that got sweet talked into spending hundreds of millions more on things that were legitimate benefits for their customers, but that management wanted to limit whenever possible on technicalities.)

bandrami 2 days ago||||

OK but, I've definitely read the assembly listings my C compiler produced when it wasn't working like I hoped. Even if that's not all that frequent it's something I expect I have to do from time to time and is definitely part of "programming".

drawnwren 2 days ago||||

It's also important to remember that vibe coders throw away the natural language spec each time they close the context window.

Vibe coding is closer to compiling your code, throwing the source away and asking a friend to give you source that is pretty close to the one you wrote.

HansHamster 2 days ago||||

> which is faithfully translated by the (hopefully bug-free) compiler.

"Hey Claude, translate this piece of PHP code into Power10 assembly!"

QuadmasterXLII 2 days ago|||

Imagine if high level coding worked like: write a first draft, and get assembly. All subsequent high level code is written in a repl and expresses changes to the assembly, or queries the state of the assembly, and is then discarded. only the assembly is checked into version control.

6510 1 day ago||

Or the opposite, all applications are just text files with prompts in them and the assembly lives as ravioli in many temp files. It only builds the code that is used. You can extend the prompt while using the application.

re-thc 2 days ago||||

> that is they are describing a new type of programmer that will only use agents and never read the underlying code

> and wouldn't look at the code anymore than, say, a PHP developer would look at the underlying assembly

This really puts down the work that the PHP maintainers have done. Many people spend a lot of time crafting the PHP codebase so you don't have to look at the underlying assembly. There is a certain amount of trust that I as a PHP developer assume.

Is this what the agents do? No. They scrape random bits of code everywhere and put something together with no craft. How do I know they won't hide exploits somewhere? How do I know they don't leak my credentials?

6510 1 day ago||

That is true for all languages. Very high quality until you use a lib, a module or an api.

straydusk 2 days ago|||

I'm glad you wrote this comment because I completely agree with it. I don't think that there is not a need for software engineers to deeply consider architecture; who can fully understand the truly critical systems that exist at most software companies; who can help dream up the harness capabilities to make these agents work better.

I just am describing what I'm doing now, and what I'm seeing at the leading edge of using these tools. It's a different approach - but I think it'll become the most common way of producing software.

csallen 2 days ago|||

> Imagine taking a picture on autoshot mode and refusing to look at it. If the client doesn’t like it because it’s too bright, tweak the settings and shoot again, but never look at the output.

The output of code isn't just the code itself, it's the product. The code is a means to an end.

So the proper analogy isn't the photographer not looking at the photos, it's the photographer not looking at what's going on under the hood to produce the photos. Which, of course, is perfectly common and normal.

kace91 2 days ago|||

>The output of code isn't just the code itself, it's the product. The code is a means to an end.

I’ll bite. Is this person manually testing everything that one would regularly unit test? Or writing black box tests that he does know are correct because of being manually written?

If not, you’re not reviewing the product either. If yes, it’s less time consuming to actually read and test the damn code

CuriouslyC 2 days ago||

I mostly ignore code, I lean on specs + tests + static analysis. I spot check tests depending on how likely I think it is for the agent to have messed up or misinterpreted my instructions. I push very high test coverage on all my projects (85%+), and part of the way I build is "testing ladders" where I have the agent create progressively bigger integration tests, until I hit e2e/manual validation.

kace91 2 days ago|||

>I spot check tests depending on how likely I think it is for the agent to have messed up or misinterpreted my instructions

So a percentage of your code, based on your gut feeling, is left unseen by any human by the moment you submit it.

Do you agree that this rises the chance of bugs slipping by? I don’t see how you wouldn’t.

And considering the fact that your code output is larger, the percentage of it that is buggy is larger, and (presumably) you write faster, have you considered the conclusion in terms of the compounding likelihood of incidents?

CuriouslyC 2 days ago||

There's definitely a class of bugs that are a lot more common, where the code deviates from the intent in some subtle way, while still being functional. I deal with this using benchmarking and heavy dogfooding, both of these really expose errors/rough edges well.

just6979 1 day ago||||

"I push very high test coverage on all my projects (85%+)"

Coverage doesn't matter if the tests aren't good. If you're not verifying the tests are actually doing something useful, talking about high coverage is just wanking.

"have the agent create progressively bigger integration tests, until I hit e2e/manual validation."

Same thing. It doesn't matter how big the tests are if they're not testing the right thing. Also why is e2e slashed with manual? Those are orthogonal. E2E tests can [and should] be fully automated for many [most?] systems. And manual validation doesn't have to wait for full e2e.

straydusk 2 days ago|||

"Testing ladders" is a great framing.

My approach is similar. I invest in the harness layer (tests, hooks, linting, pre-commit checks). The code review happens, it's just happening through tooling rather than my eyeballs.

straydusk 2 days ago||||

Exactly this. The code is an intermediate artifact - what I actually care about is: does the product work, does it meet the spec, do the tests pass?

I've found that focusing my attention upstream (specs, constraints, test harness) yields better outcomes than poring over implementation details line by line. The code is still there if I need it. I just rarely need it.

nubg 2 days ago||

People miss this a lot. Coding is just a (small) part of building a product. You get a much better bang for the buck if you focus your time on talking to the user, dogfooding, and then vibecoding. It also allows you to do many more iterations with even large changes because since your didn't "write" the code, you don't care about throwing it away.

add-sub-mul-div 2 days ago||||

A photo isn't going to fail next week or three months from now because it's full of bugs no one's triggered yet.

Specious analogies don't help anything.

alanbernstein 2 days ago||||

Right, it seems the appropriate analogy is the shift from analog-photograph-developers to digital camera photographers.

6510 1 day ago|||

The product is: solving a problem. Requirements vary.

vidarh 1 day ago|||

Your product managers most likely are not reading your code. Your CEO is not. The vast majority of your company is unlikely to ever look at a line of code.

If the process becomes reliable enough, then there is no reason. For now, that still requires developers to pay attention for important projects, but there are also a lot of AI written tools I rely on day to day that I don't, because the opportunity cost of spending time reading them is lower than the cost of accepting the risk that they do something wrong.

There are also a whole lot of tools I do read thoroughly, because the risk profile is different.

But that category is getting smaller day by day, not just with model improvements, but with improved harnesses.

CharlesW 2 days ago|||

AI-assisted coding is not a black box in the way that managing an engineering team of humans is. You see the model "thinking", you see diffs being created, and occasionally you intervene to keep things on track. If you're leveraging AI professionally, any coding has been preceded by planning (the breadth and depth of which scale with the task) and test suites.

weikju 2 days ago|||

Don’t read the code, test for desired behavior, miss out on all the hidden undesired behavior injected by malicious prompts or AI providers. Brave new world!

thefz 2 days ago||

You made me imagine AI companies maliciously injecting backdoors in generated code no one reads, and now I'm scared.

gibsonsmog 2 days ago|||

My understanding is that it's quite easy to poison the models with inaccurate data, I wouldn't be surprised if this exact thing has happened already. Maybe not an AI company itself, but it's definitely in the purview of a hostile actor to create bad code for this purpose. I suppose it's kind of already happened via supply chain attacks using AI generated package names that didn't exist prior to the LLM generating them.

djeastm 2 days ago||||

One mitigation might be to use one company's model to check the work of another company's code and depend on market competition to keep the checks and balances.

just6979 1 day ago|||

Then how many models deep do you go before it's more cost effective to just hire a junior dev, supply them with a list of common backdoors, and have them scan the code?

thefz 1 day ago|||

What about writing the actual code yourself

just6979 1 day ago||

Nah, more fun to burn money.

bandrami 2 days ago|||

Already happening in the wild

andyferris 2 days ago|||

The output is the program behavior. You use it, like a user, and give feedback to the coding agent.

If the app is too bright, you tweak the settings and build it again.

Photography used to involve developing film in dark rooms. Now my iPhone does... god knows what to the photo - I just tweak in post, or reshoot. I _could_ get the raw, understand the algorithm to transform that into sRGB, understand my compression settings, etc - but I don't need to.

Similarly, I think there will be people who create useful software without looking at what happens in between. And there will still be low-level software engineers for whom what happens in between is their job.

Aeolun 2 days ago|||

> What is the logic here?

It is right often enough that your time is better spent testing the functionality than the code.

Sometimes it’s not right, and you need to re-instruct (often) or dive in (not very often).

kace91 2 days ago||

I can’t imagine retesting all the functionality of a well established product for possible regressions not being stupidly time consuming. This is the very reason why we have unit tests in the first place, and why they are far more numerous in tests than end-to-end ones.

manmal 2 days ago|||

> I can’t imagine any other example where people voluntarily move for a black box approach.

Anyone overseeing work from multiple people has to? At some point you have to let go and trust people‘s judgement, or, well, let them go. Reading and understanding the whole output of 9 concurrently running agents is impossible. People who do that (I‘m not one of them btw) must rely on higher level reports. Maybe drilling into this or that piece of code occasionally.

kace91 2 days ago|||

>At some point you have to let go and trust people‘s judgement.

Indeed. People. With salaries, general intelligence, a stake in the matter and a negative outcome if they don’t take responsibility.

>Reading and understanding the whole output of 9 concurrently running agents is impossible.

I agree. It is also impossible for a person to drive two cars at once… so we don’t. Why is the starting point of the conversation that one should be able to use 9 concurring agents?

I get it, writing code no longer has a physical bottleneck. So the bottleneck becomes the next thing, which is our ability to review outputs. It’s already a giant advancement, why are we ignoring that second bottleneck and dropping quality assurance as well? Eventually someone has to put their signature on the thing being shippable.

wtetzner 2 days ago|||

Is reviewing outputs really more efficient than writing the code? Especially if it's a code base you haven't written code in?

kuschku 1 day ago||

It is not. To review code you need to have an understanding of the problem that can only be built by writing code. Not necessarily the final product, but at least prototypes and experiments that then inform the final product.

re-thc 2 days ago||||

> Anyone overseeing work from multiple people has to?

That's not a black box though. Someone is still reading the code.

> At some point you have to let go and trust people‘s judgement

Where's the people in this case?

> People who do that (I‘m not one of them btw) must rely on higher level reports.

Does such a thing exist here? Just "done".

manmal 2 days ago||

> Someone is still reading the code.

But you are not. That’s the point?

> Where's the people in this case?

Juniors build worse code than codex. Their superiors also can‘t check everything they do. They need to have some level of trust for doing dumb shit, or they can’t hire juniors.

> Does such a thing exist here? Just "done".

Not sure what you mean. You can definitely ask the agent what it built, why it built it, and what could be improved. You will get only part of the info vs when you read the output, but it won’t be zero info.

just6979 1 day ago|||

You: "Why did you build this?"

LLM: "Because the embeddings in your prompt are close to some embeddings in my training data. Here's some seemingly explanatory text with that is just similar embeddings to other 'why?' questions."

You: "What could be improved?"

LLM: "Here's some different stuff based on other training data with embeddings close to the original embeddings, but different.

---

It's near zero useful information. Example imformation might be "it builds" (baseline necessity, so useless info), "it passes some tests" (fairly baseline, more useful, but actually useless if you don't know what the tests are doing), or "it's different" (duh).

manmal 1 day ago|||

If I asked you, about a piece of code that you didn’t build, „What would you improve?“, how would that be fundamentally different?

re-thc 1 day ago|||

> You can definitely ask the agent what it built, why it built it, and what could be improved.

If that was true we’d have what they call AGI.

So no, it doesn’t actually give you those since it can’t reason and logic in such a way.

manmal 1 day ago||

What does that have to do with AGI?

re-thc 1 day ago||

What you asked for is AGI. How else does it think, reason and logic to answer your "why"?

It doesn't do that currently even if you think it does.

ink_13 2 days ago|||

An AI agent cannot be held accountable

manmal 2 days ago||

Neither can employees, in many countries.

Xirdus 2 days ago|||

> I can’t imagine any other example where people voluntarily move for a black box approach.

I can think of a few. The last 78 pages of any 80-page business analysis report. The music tracks of those "12 hours of chill jazz music" YouTube videos. Political speeches written ahead of time. Basically - anywhere that a proper review is more work than the task itself, and the quality of output doesn't matter much.

ink_13 2 days ago||

So... things where the producer doesn't respect the audience? Because any such analysis would be worth as much as a 4.5 hour atonal bass solo.

sroerick 2 days ago||

You can get an AI to listen to that bass solo for you

just6979 1 day ago||

But can you get an AI to zone out on a fluffy couch at the center point of a dank hi-fi setup with the volume cranked to 11, while chillin' on 50mg of THC?

And will you enjoy paying someone else to let the AI to do that?

straydusk 2 days ago|||

No pun intended but - it's been more "vibes" than science that I've done this. It's more effective. When I focus my attention on the harness layer (tests, hooks, checks, etc), and the inputs, my overall velocity improves relative to reading & debugging the code directly.

To be fair - it is not accurate to say I absolutely never read the code. It's just rare, and it's much more the exception than the rule.

My workflow just focuses much more on the final product, and the initial input layer, not the code - it's becoming less consequential.

laterium 1 day ago|||

You generate 20k LOC in a few hours. How long will it take you to read it? One week? You just keep going instead. I don't think this works great at large scale production codebases yet, but it's an approach that will have more and more applications going forward. It doesn't have to fit every use case.

AlexCoventry 2 days ago|||

> What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.

It's producing seemingly working code faster than you can closely review it.

kace91 2 days ago||

Your car can also move faster than what you can safely control. Knowing this, why go pedal to the metal?

bloomca 2 days ago|||

I think this is the logical next step -- instead of manually steering the model, just rely on the acceptance criteria and some E2E test suite (that part is tricky since you need to verify that part).

I personally think we are not that far from it, but it will need something built on top of current CLI tools.

raincole 2 days ago|||

> Because if you can read code, I can’t imagine poking the result with black box testing being faster.

I don't know... it depends on the use case. I can't imagine even the best front-end engineer ever can read HTML faster than looking at the rendered webpage to check if the layout is correct.

nubg 2 days ago||

Good analogy.

seanmcdirmid 2 days ago|||

> What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.

The AI also writes the black box tests, what am I missing here?

kace91 2 days ago||

>The AI also writes the black box tests, what am I missing here?

If the AI misinterpreted your intentions and/or missed something in productive code, tests are likely to reproduce rather than catch that behavior.

In other words, if “the ai is checking as well” no one is.

seanmcdirmid 2 days ago||

That's true. Never let the AI know about the code it wrote when writing the test for sure. Write multiple tests, have an arbitrator (also AI) figure out if implementation or tests are wrong when tests fail. Have the AI heavily comment code and heavily comment tests in the language of your spec so you can manually verify if the scenarios/parts of the implementations make sense when it matters.

etc...etc...

> In other words, if “the ai is checking as well” no one is.

"I tried nothing, and nothing at all worked!"

llamajams 1 day ago|||

Same. I stopped reading after that. I get the sense that most of these people thing all code is web or mobile or something non critical. Granted im not a web or mobile guy so I cant presume the complexity, risk, cost of such things. But I assume its in a different category than safety/mission critical things. I do dev tools for ASIL-B systems devs now and even then I cant say im comfortable not reading the generated code. Some of my junior peers are though, and im very frustrated that I feel like I keep having to play AI janitor, dont think the bosses care.

hjoutfbkfd 2 days ago|||

your metaphor is wrong.

code is not the output. functionality is the output, and you do look at that.

kace91 2 days ago||

Explain then how testing the functionality (not the new one; regressions included, this is not a school exercise) is faster than checking the code.

Are you writing black box testing by hand, or manually checking, everything that would normally be a unit test? We have unit tests precisely because of how unworkable the “every test is black box” approach is.

ForHackernews 2 days ago|||

>Imagine taking a picture on autoshot mode

Almost everyone does this. Hardly anyone taking pictures understands what f-stop or focal length are. Even those who do seldom adjust them.

There dozens of other examples where people voluntarily move to a black box approach. How many Americans drive a car with a manual transmission?

sigseg1v 1 day ago|||

Hey it's me! I shoot with manual focus lenses in RAW and drive a standard. There are dozens of us!

weikju 2 days ago|||

You missed out on the rest of the analogy though, which is the part where the photo is not reviewed before handing it over to the client.

notepad0x90 2 days ago||

people care about results. Better processes need to produce better results. this is programming not a belief system where you have to adhere to some view or else.

teecha 2 days ago||

I find so many of these comments and debates fascinating as a lay person. I'm more tech savy than mostI meet, built my own PCs, know my way around some more 'advanced' things like terminal a bit and have a deeper understanding of computer systems, software, etc. than most people I know. It has always been more of a hobby for me. People look at me as the 'tech' guy even though I'm actually not.

Something I know very little about is coding. I know there are different languages with pros and cons to each. I know some work across operating systems while others don't but other than that I don't know too much.

For the first time I just started working on my own app in Codex and it feels absolutely amazing and magical. I've not seen the code, would have basically no idea how to read it, but i'm working on a niche application for my job that it is custom tailored to my needs and if it works I'll be thrilled. Even better is that the process of building is just feels so special and awesome.

This really does feel like it is on the precipice of something entirely different. I think back to computers before a GUI interface. I think back to even just computers before mobile touch interfaces. I am sure there are plenty of people who thought some of these things wouldn't work for different reasons but I think that is the wrong idea. The focus should be on who this will work for and why and there, I think, there are a ton of possibilities.

For reference, I'm a middle school Assistant Principal working on an app to help me with student scheduling.

chasd00 2 days ago||

Keep building and keep learning, I think you are the kind of user that stands to benefit the most from this technology.

grigri907 1 day ago|||

After 10+ years of stewing on an idea, I started building an app (for myself) that I've never had the courage or time to start until now.

I really wanted to learn the coding, the design patterns, etc, but truthfully, it was never gonna happen without a Claude. I could never get past the unknown-unknowns (and I didn't even grasp how broad is the domain of knowledge it actually requires.) Best case I would have started small chunks and abandoned it countless times, piling on defeatism and disappointment each time.

Now in under two weeks of spare time and evenings, I've got a working prototype that's starting to resemble my dream. Does my code smell? Yes. Is it brittle? Almost certainly. Is it a security risk? I hope not. (It's not.)

I want to be intentional about how I use AI; I'm nervous about how it alters how we think and learn. But seeing my little toy out in the real world is flippin incredible.

thepasch 1 day ago||

> Is it a security risk? I hope not. (It's not.)

It very probably is, but if it's a personal project you're not planning on releasing anywhere, it doesn't matter much.

You should still be very cognizant that LLMs will currently fairly reliably implement massive security risks once a project grows beyond a certain size, though.

jason_oster 1 day ago||

They can also identify and fix vulnerabilities when prompted. AI is being used heavily by security researchers for this purpose.

It’s really just a case of knowing how to use the tools. Said another way, the risk is being unaware of what the risks are. And awareness can help one get out of the bad habits that create real world issues.

panny 1 day ago||

My observation is that "AI" makes easy things easier and hard things impossible. You'll get your niche app out of it, you'll be thrilled, then you'll need it to do more. Then you will struggle to do more, because the AI created a pile of technical debt.

Programmers dream of getting a green field project. They want to "start it the right way this time" instead of being stuck unwinding technical debt on legacy projects. AI creates new legacy projects instantly.

GalaxyNova 2 days ago||

> I don’t read code anymore

Never thought this would be something people actually take seriously. It really makes me wonder if in 2 - 3 years there will be so much technical debt that we'll have to throw away entire pieces of software.

subsection1h 2 days ago||

> Never thought this would be something people actually take seriously

The author of the article has a bachelor's degree in economics[1], worked as a product manager (not a dev) and only started using GitHub[2] in 2025 when they were laid off[3].

[1] https://www.linkedin.com/in/benshoemaker000/

[2] https://github.com/benjaminshoemaker

[3] https://www.benshoemaker.us/about

zipy124 2 days ago|||

Whilst I won't comment on this specific person, one of the best programmers I've met has a law degree, so I wouldn't use their degree against them. People can have many interests and skills.

straydusk 1 day ago|||

I've written code since 2012, I just didn't put it online. It was a lot harder, so all my code was written internally, at work.

But sure, go with the ad hominem.

sho_hn 2 days ago|||

> Never thought this would be something people actually take seriously.

You have to remember that the number of software developers saw a massive swell in the last 20 years, and many of these folks are Bootcamp-educated web/app dev types, not John Carmack. They typically started too late and for the wrong reasons to become very skilled in the craft by middle age, under pre-AI circumstances and statistically (of course there are many wonderful exceptions; one of my best developers is someone who worked in a retail store for 15 years before pivoting).

AI tools are now available to everyone, not just the developers who were already proficient at writing code. When you take in the excitement you always have to consider what it does for the average developer and also those below average: A chance to redefine yourself, be among the first doing a new thing, skip over many years of skill-building and, as many of them would put it, focus on results.

It's totally obvious why many leap at this, and it's even probably what they should do, individually. But it's a selfish concern, not a care for the practice as-is. It also results in a lot of performative blog posting. But if it was you, you might well do the same to get ahead in life. There's only to so many opportunities to get in on something on the ground floor.

I feel a lot of senior developers don't keep the demographics of our community of practice into account when they try to understand the reception of AI tools.

jofla_net 2 days ago|||

This is gold.

I have rarely had the words pulled out of my mouth.

The percentage of devs in my career that are from the same academic background, show similar interests, and approach the field in the same way, is probably less than %10, sadly.

arjie 2 days ago||||

Well, there are programmers like Karpathy in his original coinage of vibe coding

> There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.

Notice "don't read the diffs anymore".

In fact, this is practically the anniversary of that tweet: https://x.com/karpathy/status/2019137879310836075?s=20

famouswaffles 1 day ago|||

Ahh Bulverism, with a hint of ad-hominem and a dash no No True Scotsman. I think the most damning indictment here is the seeming inability to make actual arguments and not just cheap shots at people you've never even met.

Please tell me, "Were people excited about high-level languages just programmers who 'couldn't hack it' with assembly? Maybe you are one of those? Were GUI advocates just people who couldn't master the command line?"

sho_hn 1 day ago|||

Thanks for teaching me about Bulverism, I hadn't heard of that fallacy before. I can see how my comment displays those characteristics and will probably try to avoid that pattern more in the future.

Honestly, I still think there's truth to what I wrote, and I don't think your counter-examples prove it wrong per-se. The prompt I responded to ("why are people taking this seriously") also led fairly naturally down the road of examining the reasons. That was of course my choice to do, but it's also just what interested me in the moment.

BrouteMinou 1 day ago|||

I think he's a cook, watching people putting frozen "meals" in the microwave and telling himself: "hey! That's not cooking!".

And I totally agree with him. Throwing some kind of fallacy in the air for the show doesn't make your argument, or lack of, more convincing.

famouswaffles 1 day ago||

>I think he's a cook, watching people putting frozen "meals" in the microwave and telling himself: "hey! That's not cooking!".

It's the equivalent of saying anyone excited about being able to microwave Frozen meals is a hack who couldn't make it in the kitchen. I'm sorry, but if you don't see how ridiculous that assertion is then I don't know what to tell you.

>And I totally agree with him. Throwing some kind of fallacy in the air for the show doesn't make your argument, or lack of, more convincing.

A series of condescending statements meant to demean with no objective backing whatsoever is not an argument. What do you want me to say ? There's nothing worth addressing, other than pointing out how empty it is.

You think there aren't big shots, more accomplished than anyone in this conversation who are similarly enthusiastic?

You and OP have zero actual clue. At any advancement, regardless of how big or consequential, there are always people like that. It's very nice to feel smart and superior and degrade others, but people ought to be better than that.

So I'm sorry but I don't really care how superior a cook you think you are.

sho_hn 1 day ago||

> You think there aren't big shots, more accomplished than anyone in this conversation who are similarly enthusiastic?

I think both things can be true simultaneously.

You're arguing against a straw man.

famouswaffles 13 hours ago||

Pointing out that your argument relies on an unverifiable (and easily countered) generalization isn't a straw man.

sixdimensional 2 days ago|||

Half serious - but is that really so different than many apps written by humans?

I've worked on "legacy systems" written 30 to 45 years ago (or more) and still running today (things like green-screen apps written in Pick/Basic, Cobol, etc.). Some of them were written once and subsystems replaced, but some of it is original code.

In systems written in the last.. say, 10 to 20 years, I've seen them undergo drastic rates of change, sometimes full rewrites every few years. This seemed to go hand-in-hand with the rise of agile development (not condemning nor approving of it) - where rapid rates of change were expected.. and often the tech the system was written in was changing rapidly also.

In hardware engineering, I personally also saw a huge move to more frequent design and implementation refreshes to prevent obsolescence issues (some might say this is "planned obsolescence" but it also is done for valid reasons as well).

I think not reading the code anymore TODAY may be a bit premature, but I don't think it's impossible to consider that someday in the nearer than further future, we might be at a point where generative systems have more predictability and maybe even get certified for safety/etc. of the generated code.. leading to truly not reading the code.

I'm not sure it's a good future, or that it's tomorrow, but it might not be beyond the next 20 year timeframe either, it might be sooner.

sixdimensional 2 days ago||

I would enjoy discussion with whoever voted this down - why did you?

What is your opinion and did you vote this down because you think it's silly, dangerous or you don't agree?

joriJordan 2 days ago|||

Remember though this forum is full of people who consider code objects when it's just state in a machine.

We have been throwing away entire pieces of software forever. Where's Novell? Who runs 90s Linux kernels in prod?

Code isn't a bridge or car. Preservation isn't meaningful. If we aren't shutting the DCs off we're still burning the resources regardless if we save old code or not.

Most coders are so many layers of abstraction above the hardware at this point anyway they may as well consider themselves syntax artists as much as programmers, and think of Github as DeviantArt for syntax fetishists.

Am working on a model of /home to experiment with booting Linux to models. I can see a future where Python in my screen "runs" without an interpreter because the model is capable of correctly generating the appropriate output without one.

Code is ethno objects, only exists socially. It's not essential to computer operations. At the hardware level it's arithmetical operations against memory states.

Am working on my own "geometric primitives" models that know how to draw GUIs and 3D world primitives, text; think like "boot to blender". Rather store data in strings, will just scaffold out vectors to a running "desktop metaphor".

It's just electromagnetic geometry, delta sync between memory and display: https://iopscience.iop.org/article/10.1088/1742-6596/2987/1/...

geraneum 2 days ago||

Wie bitte?

strken 2 days ago|||

I'm torn between running away to be an electrician or just waiting three years until everyone realises they need engineers who can still read.

Sometimes it feels like pre-AI education is going to be like low-background steel for skilled employees.

Aeolun 2 days ago|||

> 2 - 3 years there will be so much technical debt that we'll have to throw away entire pieces of software.

That happens just as often without AI. Maybe the people that like it all thave experience with trashing multiple sets of products over the course of their life?

binsquare 2 days ago|||

Reading and understanding code is more important than writing imo

eikenberry 2 days ago||

It’s pretty well established that you cannot understand code without having thought things through while writing it. You need to know why things are written the way the are to understand what is written.

tomjakubowski 2 days ago|||

Yeah, just reading code does little to help me understand how a program works. I have to break it apart and change it and run it. Write some test inputs, run the code under a debugger, and observe the change in behavior when changing inputs.

fragmede 1 day ago|||

If that were true, then only the person who wrote the code could ever understand it enough to fix bugs, which is decidedly not true.

jason_oster 1 day ago||

I’ll grant you that there are many trivial software defects that can be identified by simply reading the code and making minor changes.

But for architectural issues, you need to be able to articulate how you would have written the code in the first place, once you understand the existing behavior and its problems. That is my interpretation of GP’s comment.

Hamuko 2 days ago|||

I've seen software written and architected by Claude and I'd say that they're already ready to be thrown out. Security sucks, performance will probably suck, maintainability definitely sucks, and UX really fucking sucks.

j_bizzle 2 days ago|||

The coincidental timing between the rapid increase in the number of emergency fixes coming out on major software platforms and the proud announcement of the amount of code that's being produced by AI at the same companies is remarkable.

I think 2-3 years is generous.

Don't get me wrong, I've definitely found huge productivity increases in using various LLM workflows in both development as well as operational things. But removing a human from the loop entirely at this point feels reckless bordering on negligent.

straydusk 2 days ago|||

I actually think this is fair to wonder about.

My overall stance on this is that it's better to lean into the models & the tools around them improving. Even in the last 3-4 months, the tools have come an incredible distance.

I bet some AI-generated code will need to be thrown away. But that's true of all code. The real questions to me are - are the velocity gains be worth it? Will the models be so much better in a year that they can fix those problems themselves, or re-write it?

I feel like time will validate that.

bloomca 2 days ago|||

If the models don't get to the point where they can correct fixes on their own, then yeah, everything will be falling apart. There is just no other way around increasing entropy.

The only way to harness it is to somehow package code producing LLMs into an abstraction and then somehow validate the output. Until we achieve that, imo doesn't matter how closely people watch out the output, things will be getting worse.

esperent 2 days ago||

> If the models don't get to the point where they can correct fixes on their own

Depending on what you're working on, they are already at that point. I'm not into any kind of AI maximalist "I don't read code" BS (I read a lot of code), but I've been building a fairly expensive web app to manage my business using Astro + React and I have yet to find any bug or usability issue that Claude Code can't fix much faster than I would have (+). I've been able to build out, in a month, a fully TDD app that would have conservatively taken me a year by myself.

(+) Except for making the UI beautiful. It's crap at that.

The key that made it click is exactly what the person describes here: using specs that describe the key architecture and use cases of each section. So I have docs/specs with files like layout.md (overall site shell info), ui-components.md, auth.md, database.md, data.md, and lots more for each section of functionality in the app. If I'm doing work that touches ui, I reference layout and ui-components so that the agent doesn't invent a custom button component. If I'm doing database work, reference database.md so that it knows we're using drizzle + libsql, etc.

This extends up to higher level components where the spec also briefly explains the actual goal.

Then each feature building session follows a pattern: brainstorm and create design doc + initial spec (updates or new files) -> write a technical plan clearly following TDD, designed for batches of parallel subagents to work on -> have Claude implement the technical plan -> manual testing (often, I'll identify problems and request changes here) -> automated testing (much stricter linting, knip etc. than I would use for myself) -> finally, update the spec docs again based on the actual work that was done.

My role is less about writing code and more about providing strict guardrails. The spec docs are an important part of that.

Computer0 2 days ago|||

I have wondered the same but for the projects I am completely "hands off" on, the model improvements have overcome this issue time and time again.

rustyhancock 2 days ago|||

I'm 2-3 years from now if coding AI continues to improve at this pace I reckon people will rewrite entire projects.

I can't imagine not reading the code I'm responsible for any more than I could imagine not looking out the windscreen in a self driving Tesla.

But if so many people are already there, and mostly highly skilled programmers imagine in 2 years time with people who've never programmed!

nullsanity 2 days ago|||

If I keep getting married at the same pace I have, then in a few years I'll have like 50 husbands.

weakfish 2 days ago||||

Well, Tesla has been nearly at FSD for how long? The analogy you make sorta makes it sound less likely

GalaxyNova 2 days ago|||

Seems dangerous to wager your entire application on such an uncertainty

geraneum 2 days ago||

Some people are not aware that they are one race condition away from a class action lawsuit.

abrookewood 2 days ago|||

The proponents of Spec Driven Development argue that throwing everything out completely and rebuilding from scratch is "totally fine". Personally, I'm not comfortable with the level of churn.

well_ackshually 2 days ago|||

Also take something into account: absolutely _none_ of the vibe coding influencer bros make anything more complicated than a single-feature, already implemented 50 times webapp. They've never built anything complicated either, or maintained something for more than a few years with all the warts that it entails. Literally, from his bio on his website:

> For 12 years, I led data and analytics at Indeed - creating company-wide success metrics used in board meetings, scaling SMB products 6x, managing organizations of 70+ people.

He's a manager that made graphs on Power BI.

They're not here because they want to build things, they're here to shit a product out and make money. By the time Claude has stopped being able to pipe together ffmpeg commands or glue together 3 JS libraries, they've gone on to another project and whoever bought it is a sucker.

It's not that much different from the companies of the 2000s promising a 5th generation language with a UI builder that would fix everything.

And then, as a very last warning: the author of this piece sells AI consulting services. It's in his interest to make you believe everything he has to say about AI, because by God is there going to be suckers buying his time at indecently high prices to get shit advice. This sucker is most likely your boss, by the way.

fragmede 1 day ago||

No true programmer would vibecode an app, eh?

well_ackshually 1 day ago|||

Oh no, they would. I would.

I'd have the decency to know and tell people that it's a steaming pile of shit and that I have no idea how it works though, and would not have the shamelessness to sell a course on how to put out LLM vomit in public though.

Engineering implies respect for your profession. Act like it.

sebastos 1 day ago|||

But invoking No True Scotsman would imply that the focus is on gatekeeping the profession of programming. I don’t think the above poster is really concerned with the prestige aspect of whether vibe bros should be considered true programmers. They’re more saying that if you’re a regular programmer worried about becoming obsolete, you shouldn’t be fooled by the bluster. Vibe bros’ output is not serious enough to endanger your job, so don’t fret.

farnsworth 2 days ago|||

Yes, and you can rebuild them for free

RA_Fisher 2 days ago|||

Claude, Codex and Gemini can read code much faster than we can. I still read snippets, but mostly I have them read the code.

GalaxyNova 2 days ago||

Unfortunately they're still too superficial. 9 times out of 10 they don't have enough context to properly implement something and end up just tacking it on in some random place with no regard for the bigger architecture. Even if you do tell it something in an AGENT.md file or something, it often just doesn't follow it.

RA_Fisher 2 days ago||

I use them to probabilistically program. They’re better than me and I’ve been at it for 16 years now. So I wouldn’t say they’re superficial at all.

What have you tried to use them for?

ekidd 2 days ago|||

I have a wide range of Claude Code based setups, including one with an integrated issue tracker and parallel swarms.

And for anything really serious? Opus 4.5 struggles to maintain a large-scale, clean architecture. And the resulting software is often really buggy.

Conclusion: if you want quality in anything big in February 2026, you still need to read the code.

manmal 2 days ago||

Opus is too superficial for coding (great at bash though, on the flipside), I‘d recommend giving Codex a try.

cdfuller 2 days ago||

As LLMs advance so rapidly I think that all the AI slop code written today will be easily digestible by the LLMs a few generations down the line. I think there will be a lot of improvements in making user intent clearer. Combined with a bad codebase and larger context windows, refactoring wont be a challenge.

gchamonlive 2 days ago||

The skills required to perform as a software engineer in an environment where competent AI agents is a commodity has shifted. Before it was important for us to be very good as reading documentation and writing code. Now we need to be very good at writing docs, specs and interfaces, and reading code.

That goes a bit against the article, but it's not reading code in the traditional sense where you are looking for common mistakes we humans tend to make. Instead you are looking for clues in the code to determine where you should improve in the docs and specs you fed into your agent, so the next time you run it chances are it'll produce better code, as the article suggests.

And I think this is good. In time, we are going to be forced to think less technically and more semantically.

Ifkaluva 2 days ago||

Sometimes when I vibe code, I also have a problem with the code, and find myself asking: “What went wrong with the system that produced the code?”

The answer is clear: I didn’t write the code, I didn’t read it, I have no idea what it does, and that’s why it has a bug.

Aeolun 2 days ago|

That as it may be. I spot bugs a lot faster when I didn’t write the code than when I did.

Ifkaluva 2 days ago||

Well, I’d wager there are quite a few more bugs, so naturally it should be easier to spot a few.

Bishonen88 1 day ago||

When you write code yourself, you're convinced each line is correct as you write it. That assumption is hard to shake, so you spend hours hunting for bugs that turn out to be obvious. When reading AI-generated code fresh, you lack that assumption. Bugs can jump out faster. That's at least my naive explanation to this phenomenon

oxag3n 2 days ago||

Following this logic, why not move further left?

Become a CTO, CEO or even a venture investor. "Here's $100K worth tokens, analyze market, review various proposals from Agents, invest tokens, maximize profit".

You know why not? Because it will be more obvious it doesn't work as advertised.

krackers 1 day ago||

If one truly believed in LLMs being able to replace knowledge workers, then it would also hold that they could replace managers and execs. In fact, they should be able to do it even better: LLMs could convert every company into a "flat" one, bypassing the manangement hierarchy and directly consuming meeting notes from every meeting to get the real status as the source of truth, and provide suggestions as needed. If combined with web-search capability, they would also be more plugged into the market, customer sentiment, and competitors than most execs could ever be.

ajam1507 1 day ago|||

We're not at the point where we are replacing all software developers entirely (and will never be without real AGI), but we are definitely at the point where scaling back headcount is possible.

Also, creating software is much more testable and verifiable than what a CEO does. You can usually tell when the code isn't right because it doesn't work or doesn't pass a test. How can you verify that your AI CEO is giving you the right information or planning its business strategy effectively?

It's one of the biggest reasons that software development and art are the two domains in which AI excels. In software you can know when it's right, and in art it doesn't matter if it's right.

krackers 1 day ago||

>You can usually tell when the code isn't right because it doesn't work or doesn't pass a test

Tests (as usually written, in unit-test form) only tell you that it's not completely broken, they're not a good indicator of it working well otherwise "vibecoded slop" wouldn't be a thing. And the tests themselves are usually vibecoded too which doesn't help much in detecting issues off the happy path.

>you verify that your AI CEO is giving you the right information or planning its business strategy effectively

The same could be said for human CEOs. A lot of them don't really have good success rates either.

ajam1507 1 day ago||

> Tests (as usually written, in unit-test form) only tell you that it's not completely broken, they're not a good indicator of it working well otherwise "vibecoded slop" wouldn't be a thing

You can certainly end up with vibecoded slop that passes all the tests, but it won't pass other forms of evaluation (necessarily true, otherwise you could not identify it as vibecoded slop.)

> The same could be said for human CEOs. A lot of them don't really have good success rates either.

This is part of my point. The tight feedback loop that enables us to judge a model's efficacy in software, doesn't exist for the role of CEO.

joquarky 1 day ago|||

> LLMs could convert every company into a "flat" one, bypassing the manangement hierarchy

It sounds like you're describing Manna by Marshall Brain

machiaweliczny 1 day ago|||

I think this will work but requires quality data pipelines and scaffolding same as coding

ipnon 2 days ago||

You have to move up or down to survive. In 10 years we'll either be managers (either of humans or agents), or we'll be electrical engineers. Programming is done! I for one am glad.

oxag3n 2 days ago||

There are two extremes and spectrum in between:

* AI can replace knowledge workers - most of existing software engineers, managers of all levels will loose their job and have to re-qualify.

* AI requires human in the loop.

In the first scenario, I see no reason to waste time and should start building plan B now (remaining job markets will be saturated at that point).

In the second scenario, tech-debt and zettabytes of slop will harm companies which relied on it heavily. In the age of failing giants and crumbling infrastructure, engineers and startups that can replace gigawatt burning data center with a few kilowatt rack, by manually coding a shell script that replaces Hadoop, will flourish.

Most probably it will be a spectrum - some roles can be replaced, some not.

sho_hn 2 days ago||

I still think this is mostly people who never could hack it at coding taking to the new opportunities that these tools afford them without having to seriously invest in the skill, and basking in touting their skilless-ness being accepted as the new temporary cool.

Which is perhaps what they should do, of course. Any transition is a chance to get ahead and redefine yourself.

CuriouslyC 2 days ago||

Just FYI, this is the attitude that causes pro-AI people to start shit-talking anti-AI folks as Luddites who need to learn to use the tools.

Agents are a quality/velocity tradeoff (which is often good), if you can't debug stuff without them that's a problem as you'll get into holes, but that doesn't mean you have to write the code by hand.

sho_hn 2 days ago|||

I enjoy new technology in general, so I very much keep up with the tools and also like using them for the things they do well at any given moment. I'm not among the Luddites, FWIW. I think there's a lot of legitimately great building going on right now.

Note though we're talking about "not reading code" in context, not the writing of it.

sjsisjsh 2 days ago|||

Author is a former data analytics product manager (already a bit of a tea leaf reading domain) who says he never reads code and is now marketing himself as a new class of developer.

Parent post sounds like a very accurate description.

straydusk 2 days ago||

I completely agree in a sense - the cost of producing software is plummeting, and it's leading to me being able to develop things that I would never have invested months in before.

sho_hn 2 days ago||

This blog post is written by a product manager, not a programmer. Their CV speaks to an Economics background, a stint in market research, writing small scripting-type programs ("Cron+MySQL data warehouse") and then off to the product management races.

What it's trying to express is that the (T)PM job still should still be safe because they can just team-lead a dozen agents instead of software developers.

Take with a grain of salt when it comes to relevance for "coding", or the future role breakdown in tech organizations.

straydusk 2 days ago|

That's me! I'm pretty open about that.

I'm not trying to express that my particular flavor of career is safe. I think that the ability to produce software is much less about the ability to hand-write code, and that's going to continue as the models and ecosystem improve, and I'm fascinated by where that goes.

Groxx 2 days ago||

>I think the industry is moving left. Toward specs. The code is becoming an implementation detail. What matters is the system that produces it - the requirements, the constraints, the architecture. Get those right, and the code follows.

So basically a return to waterfall design.

Rather than YOLO planning (agile), we go back to YOLO implementation (farming it out to dozens of replaceable peons, but this time they're even worse).

prewett 2 days ago|

I really wish posts like this explained what sort of development they are doing. Is this for an internal CRUD server? Internal React app? Scala server with three instances? Golang server with complex AWS configuration? 10k lines? 100k lines? 1M+? Externally facing? iOS app? Algorithm-heavy photo processing desktop app? It would give me a much better idea of whether the argument is reasonable, and whether it is applicable for the kind of software I generally write.

bopbopbop7 2 days ago||

The author is a PM with a bachelors in economics who got laid off last year and began building with AI. Zero engineering experience.

You can guess what kind of software he is building.

When you read the 100th blog post about how AI is changing software development, just remember that these are the authors.

TonyStr 1 day ago|||

He makes <10k cloc websites trying to sell you a spec-creation wizard[0]. Considering Claude wrote the site, it could probably be written in 1/10th of the lines.

I think these tools are great for allowing non-technical people like OP to create landing pages and small prototypes, but they're useless for anything serious. That said, I applaud OP for embodying the "when in a gold rush, sell shovels" mentality.

[0] - https://vibescaffold.dev/

straydusk 2 days ago||

You're completely right and I wish I had in retrospect... I was honestly just talking mostly in broad terms, but people really (maybe rightly) focused on the "not reading code" snippet.

I'm mostly developing my own apps and working with startups.

More comments...