Posted by martinald 5 days ago
* People using it as a tool, aware of its limitations and treating it basically as intern/boring task executor (whether its some code boilerplate, or pooping out/shortening some corporate email), or as tool to give themselves summary of topic they can then bite into deeper.
* People outsourcing thinking and entire skillset to it - they usually have very little clue in the topic, are interested only in results, and are not interested in knowing more about the topic or honing their skills in the topic
The second group is one that thinks talking to a chatbot will replace senior developer
I don't know if this describes your situation, but I know many people who are dealing with positions where they have no technical mentorship, no real engineering culture to grow in, and a lot of deadlines and work pressure. Coupled with this, they often don't have a large social group within programming/tech, because they've only been in it for a few years and have been heads down grinding to get a good job the whole time. They're experiencing a weird mixture of isolation, directionless-ness, and intense pressure. The work is joyless for them, and they don't see a future.
If I can offer any advice, be selfish for a bit. Outsource as much as you want to LLMs, but use whatever time savings you get out of this to spend time on programming-related things you enjoy. Maybe work the tickets you find mildly interesting without LLMs, even if they aren't mission critical. Find something interesting to tinker with. Learn a niche language. Or slack off in a discord group/make friends in programming circles that aren't strictly about career advancement and networking.
I think it's basically impossible to get better past a certain level if you can't enjoy programming, LLM-assisted or otherwise. There's such a focus on "up-skilling" and grinding through study materials in the culture right now, and that's all well and good if you're trying to pass an interview in 6 weeks, but all of that stuff is pretty useless when you're burned out and overwhelmed.
I also learned that I absolutely hate most programmers. No offense. But most I've been talking to have a complete lack of ethics. I really love programming but I have a massive issue with how industry scale programming is performed (just offloading infra to AWS, just using random JS libs for everything, buying design templates instead of actually building components yourself, 99% of apps being simple CRUD and I am so incredibly tired of building http based apps, web forms and whatnot...)
I love tech, but the industry does not have a soul. The whole joy of learning new things is diminishing the more I learn about the industry.
Could be for good reasons (e.g. they're security features that are important to the business but add friction for the user) or just because management is disconnected from the reality of their employees. Either way, not necessarily the wrong decision by the PM - sometimes you've gotta build features fast because the buyer demands them in a certain timeframe in order to get the contract signed. Even if they never get used, the revenue still pays the bills.
Security features that add friction for the user are usually forced, aren't they?
Contract requirements do make sense, but I get the idea that this user would know that.
What are you imagining that would be actual value but not used for six months?
Nah. It's been at least since 2009 (GBC), if not longer.
It started happening with the advent of applicant tracking systems (making hiring a nightmare, which it still is) and the fact that most companies stopped investing into training of juniors and started focusing more on the short-term bottom line.
If the company is going to make it annoying to get hired and won't invest anything in you as a professional, there's 0 reason for loyalty besides giving your time for the paycheck. And 0 reason to go 120% so you burn out.
The problem, as I see it, is the changes that bug me [1] seem systemic throughout the economy, "best practices" promulgated by consultants and other influencers. I'm actually under the impression my workplace was a bit behind the curve, at a lot of other places are worse.
[1] Not sure if they're the "actions" you're talking about. I'm talking about offshoring & AI (IHMO part of the same thrust), and a general increase in pressure/decrease in autonomy.
Devs are hired goons at worst and skilled craftspeople at best, but never professionals.
Because having a job that's somewhat satisfying and not just a grind is great for one's own well-being. It's also not a bad deal for the employer, because an engaged employee delivers better results than one who doesn't give a shit.
And this may be fine in certain cases.
I'm learning German and my listening comprehension is marginal. I took a practice test and one of the exercises was listening to 15-30 seconds of audio followed by questions. I did terribly, but it seemed like a good way to practice. I used Claude Code to create a small app to generate short audio (via ElevenLabs) dialogs and set of questions. I ran the results by my German teacher and he was impressed.
I'm aware of the limitations: Sometimes the audio isn't great (it tends to mess up phone numbers), it can only a small part of my work learning German, etc.
The key part: I could have coded it, but I have other more important projects. I don't care that I didn't learn about the code. What I care about is I'm improving my German.
> Group 1: intern/boring task executor
Yup, that makes sense I'm in group 1.
> Group 2: "outsourcing thinking and entire skillset to it - they usually have very little clue in the topic, are interested only in results"
Also me (in this case), as I'm outsourcing the software development part and just want the final app.
Soo... I probably have thought too much about the original proposed groups. I'm not sure they are as clear as the original suggests.
It is though. App is using AI underneath to generate audio snippets. That's literally its purpose
The word "thinking" can be a bit nebulous in these conversations, and critical thinking perhaps even more ambiguously defined, so before we discuss that, we need to define it. I go with the Merriam-Webster definition: the act or practice of thinking critically (as by applying reason and questioning assumptions) in order to solve problems, evaluate information, discern biases, etc.
LLMs seem to be able to mimic this, particularly to those who have no clue what it means when we call an LLM a "stochastic parrot" or some equally esoteric term. At first I was baffled that anyone really thought that LLMs could somehow apply reason or discern its own biases but I had to take a step back and look at how that public perception was shaped to see what these people were seeing. LLMs, generative AI, ML, etc are all extremely complex things. Couple that with the pervasive notion that thinking is hard and you have a massive pool of consumers who are only too happy to offload some of that thinking on to something they may not fully understand but were promised that it would do what they wanted, which is make their daily lives a bit easier.
We always get snagged by things that promise us convenience or offer to help us do less work. It's pretty human to desire both of those things, but proving to be an Achilles Heel for many. How we characterize AI determines our expectations of it; so do you think of it as a bag of tools you can use to complete tasks? Or is it the whole factory assembly line where you can push a few buttons and an pseudo-finished product comes out the other side?
Don't care about code quality; never seen the code. I care if the tools do the things I want them to do, and they verifiably do.
The most fun one is this, which creates listing images for my products: https://theautomatedoperator.substack.com/p/opus-45-codes-ge...
More recently, I'm using Claude Code to handle my inventory management by having it act as an analyst while coding itself tools to access my Amazon Seller accounts to retrieve the necessary info: https://theautomatedoperator.substack.com/p/trading-my-vibe-...
On the verification front, a few examples:
1. I built an app that generates listing images and whitebox photos for my products. Results there are verifiable for obvious reasons.
2. I use Claude Code to do inventory management - it has a bunch of scripts to pull the relevant data from Amazon then a set of instructions on how to project future sales and determine when I should reorder. It prints the data that it pulls from Amazon to the terminal, so that's verifiable. In terms of following the instructions on coming up with reorder dates, if it's way off, I'm going to know because I'm very familiar with the brands that I own. This is pretty standard manager/subordinate stuff - I put some trust in Claude to get it right, but I have enough context to know if the results are clearly bad. And if they're only off by a little, then the result is I incur some small financial penalty (either I reorder too late and temporarily stock out or I reorder too early and pay extra storage fees). But that's fine - I'm choosing to make that tradeoff as one always does when one hands off work.
3. I gave Claude Code a QuickBooks API key and use it to do my books. This one gets people horrified, but again, I have enough context to know if anything's clearly wrong, and if things are only slightly off then I will potentially pay a little too much in taxes. (Though to be fair it's also possible it screws up the other way, I underpay in taxes and in that case the likeliest outcome is I just saved money because audits are so rare.)
Let's say I have a 5 person company and I vibe-engineer an application to manage shifts and equipment. I "verify" it by seeing with my own eyes that everyone has the tools they need and every shift is covered.
Before I either used an expensive SaaS piece of crap for it or did it with Excel. I didn't "verify" the Excel either and couldn't control when the SaaS provider updated their end, sometimes breaking features, sometimes adding or changing them.
This is actually the greatest use case I see, and interact with.
LLMs make me think out loud way better.
Best rubber duck ever.
So I learned that you can definitely glean some insights from it. One insight I have is: I'm a "talk out loud thinker". I don't really value that as an identity thing but it is definitely something I notice that I do. I also think a lot of things in my mind, but I tend to think out loud more than the average person.
So yea, that's how pseudo science can sometimes still lead to useful insights about one particular individual. Same thing with philosophy really, usually also not empirically tested (I do think it has a stronger academic grounding but to call philosophy a science is... a bit... tricky... in many cases. I think the common theme is that it's also usually not empirically grounded but still really useful).
A few weeks ago a critical bug came in on a part of the app I’d never touched. I had Claude research the relevant code while I reproduced the bug locally, then had it check the logs. That confirmed where the error was, but not why. This was code that ran constantly without incident.
So I had Claude look at the Excel doc the support person provided. Turns out there was a hidden worksheet throwing off the indices. You couldn’t even see the sheet inside Excel. I had Claude move it to the end where our indices wouldn’t be affected, ran it locally, and it worked. I handed the fixed document back to the support person and she confirmed it worked on her end too.
Total time to resolution: 15 minutes, on a tricky bug in code I’d never seen before. That hidden sheet would have been maddening to find normally. I think we might be strongly overestimating the benefits of knowing a codebase these days.
I’ve been programming professionally for about 20 years. I know this is a period of rapid change and we’re all adjusting. But I think getting overly precious about code in the age of coding agents is a coping mechanism, not a forward-looking stance. Code is cheap now. Write it and delete it.
Make high leverage decisions and let the agent handle the rest. Make sure you’ve got decent tests. Review for security. Make peace with the fact that it’s cheaper to cut three times and measure once than it used to be to measure twice and cut once.
Compared to the mess created by Node.js npm amateur engineers, it really shows who is 10x or 100x.
Outsourcing critical thinking to pattern matching and statistical prediction will make the haystacks even more unmanageable.
From my perspective the distinction is more on the supply side and we have two generations of AI tools. The first generation was simply talking to a chatbot in a web UI and it's still got its uses, you chat and build up a context with it, it's relying heavily on its training data, maybe it's reading one file.
The second generation leans into RAG and agentic capabilities (if you can glob and grep or otherwise run a search, congrats you have v1 of your RAG strategy). This is where Gemini actually scans all the docs in our Google Workspace and produces a proposal similar to ones we've written before. (Do we even need document templates anymore?) Or where you start a new programming project and Claude can write all the boilerplate, deploy and set up a barebones test suite within a couple of minutes. There's no doubt that these types of tools give us new capabilities and in some cases save a lot more time than just babbling into chatgpt.com.
I think this accounts for a lot of differences in terms of reported productivity by the sane users. I was way less enthusiastic about AI productivity gains before I discovered the "gen 2" applications.
Sometimes I just want the thing and really don't care about any details. Sometimes I want a very specific thing built in a very specific way. Sometimes I care about some details and not others.
How I use the tools at my disposal depends on what I want to get out of the effort.
* people who use it instead of search engines.
* people who use it as a doctor/therapist/confidant. Not to research. But as a practitioner.
There are others:
* people who use it instead of man pages or documentation.
* people who use it for short scripts in a language they don't quite understand but "sorta kinda".
And the first group thinks that these tools will enable them to replace a whole team of developers.
It's nice to brainstorm with too, but you have to know what you're doing.
It gets stuck on certain things for sure. But all in all it's a great productivity tool. I treat it like an advanced auto complete. That's basically how people need to treat it. You have to spend a lot of time setting up context and detailing what you want.
So does it save time? Yea, it can. It may not in every task, but it can. It's simply another way of coding. It's a great assistant, but it's not replacing a person.
> People using it as a tool, aware of its limitations
You can't know the limitations of these tools. It is literally unknowable. Depending on the context, the model and the task, it can be brilliant or useless. It might do the task adequately first time, then fail ten times in a row.
> People outsourcing thinking and entire skillset to it
You can't get out of it something that you can't conceive of. There are real life consequences to not knowing what AI produced. What you wrote basically assumes that there is a group who consistently hit themselves on the head with a hammer not knowing what hurt them.
- Peer reviews. Not the only peer review of code, but a "first pass" to point out anything that I might have missed
- Implementing relatively simple changes; ones where the "how" doesn't require a lot of insight into long term planning
- Smart auto-complete (and this one is huge)
- Searching custom knowledge bases (I use Obsidian and have an AI tied into it to search through my decade+ of notes)
- Smart search of the internet; describing the problem I'm trying to solve and then asking it to find places that discuss that type of thing
- I rarely use it to clean up emails, but it does happen sometimes. My emails tend to be very technical, and "cleaning them up" usually requires I spend time figuring out what information not to include
I'm a subject matter expert 45 years in programming and data, aware of the tools limitation but still use it all day every day to implement non-trivial code, all the while using other tools to do voice transcription, internal blog posting about new tools, agents information gathering while I sleep, various classifiers, automated OCR, email scanning, recipe creation, electronics designing, many many other daily tasks.
I can see the day that all of these folks completing replacing their thinking skill with AI, unable to find job because they can no longer troubleshoot anything without AI.
I use AI as replacement for search engine, I spent 3 nights using ChatGPT to assist me in deploying a Proxmox LXC container running 4 network services and the whole traffic is routed to Proton VPN via WireGuard. If the VPN goes down, the whole container network stops without using my real IP. Everything was done via Ansible which I use to manage my homelad, and was able to identify mistakes and fix them myself. Dude, I have learned a ton with LXC and sort of moving away from VMs.
Once they realize that it doesn't replace senior but can easily replace junior, junior dev will have a bigger problem and the industry at large will have a huge problem in 8 years because the concept of "senior" would have vanished.
I consider 8 years to be the real experience to be considered a senior dev.
If from now on, the amount of junior is drastically reduced, this will lead to a lack of senior in 8 years because the senior leaving should be the same proportion.
In a situation where they replace juniors with agents, yes, we'll still be senior, but just like people capable of setting a VHS recorder, our number will dwindle.
No one is going to replace senior developers. But senior developer pay WILL decrease relative to its historical values.
Think wider. You, sharperguy, are not and will not the only person with access to these tools. Therefore, your productivity increase will likely be the same as everyone else's. If you are as good as everyone else, why would YOU get paid more? Have you ever seen a significant number of companies outside FAANG permanently boost everyone's salary just because they they did well on a given year?
A company's goal is to the shareholders not to you. Your value exists relative to that of others.
If every coal miner could suddenly produce 10x the amount of goal, do people say "well now we can just hire one coal miner instead of 10". Or do they say "now thousands of new project which were not economically viable due to the high price of coal are now viable, meaning we actually need to increase our total output beyond even 10x of what it was previously."
Plus, look at the job market. Every single tech company out there has been laying off devs in the last 3 years. If maximising productivity above expenses was so valuable, every tech company out there would be hiring like crazy because senior devs are cheap as chips nowadays. But they aren't, devs might be cheap but money itself isn't right now so they are prioritising lower expenses over increased productivity. Because that makes shareholders happy. And that's what every company aims for.
Maximising productivity is only an absolute goal in the minds of devs not in the minds of executives.
Not necessarily, there are many factors at play here which are downplayed. The first one is education: LLMs are going to significantly improve skill training. Arguably, it is already happening. So the gap between you and a middev will get narrower. At the same time, candidates who can be as good as you will increase.
While you can argue that you possess specialised skills that not many do, you are unlikely to prove that under pressure within a couple of hours and certainly not to the level where you can have late 10s level of negotiating power imo.
At the end of the day, the market can stay irrational longer than you can continue refuse to accept a lower offer imo. I believe there will be winners. But pure technical skill isn't the moat you think it is. Not anymore.
Now AI agents are cheap but they generate a lot of slop, and potential minefields that might be costly to clean. The ROI will show up eventually and people in the second group will find out their jobs might be in danger. Hopefully a third group will come to save them.
Junior devs: who have limited experience or depth in knowledge. They are unable to analyze the output of AI coding agents sufficiently to determine long term viability of the code. I think this is the entirety of who you're speaking of.
Senior devs: who are using it for more than a basic task executor. They have a decade+ of experience and can quickly understand if what the AI coding agent suggests is viable long term or not. When it's not, they understand how to steer it into a more appropriate direction.
I think AI is just allowing everyone to speed-run the innovator's dilemma. Anyone can create a small version of anything, while big orgs will struggle to move quickly as before.
The interesting bit is going to be whether we see AI being used in maturing those small systems into big complex ones that account for the edge cases, meet all the requirements, scale as needed, etc. That's hard for humans to do, and particularly while still moving. I've not see any of this from AI yet outside of either a) very directed small changes to large complex systems, or b) plugins/extensions/etc along a well define set of rails.
When I needed to bash out a quick Hashicorp Packer buildfile without prior experience beyond a bit of Vault and Terraform, local AI was a godsend at getting me 80% of the way there in seconds. I could read it, edit it, test it, and move much faster than Packer’s own thin “getting started” guide offered. The net result was zero prior knowledge to a hardened OS image and repeatable pipeline in under a week.
On the flip side, asking a chatbot about my GPOs? Or trusting it to change network firewalls and segmentation rules? Letting it run wild in the existing house of cards at the core of most enterprises? Absolutely hell no the fuck not. The longer something exists, the more likely a chatbot is to fuck it up by simple virtue of how they’re trained (pattern matching and prediction) versus how infrastructure ages (the older it is or the more often it changes, the less likely it is to be predictable), and I don’t see that changing with LLMs.
LLMs really are a game changer for my personal sales pitch of being a single dinosaur army for IT in small to medium-sized enterprises.
Honestly the absolute revolution for me would be if someone managed to make LLM tell "sorry I don't know enough about the topic", one time I made a typo in a project name I wanted some info on and it outright invented commands and usages (that also were different than the project I was looking for so it didn't "correct the typo") out of thin air...
https://arxiv.org/abs/2509.04664
According to that OpenAI paper, models hallucinate in part because they are optimized on benchmarks that involve guessing. If you make a model that refuses to answer when unsure, you will not get SOTA performance on existing benchmarks and everyone will discount your work. If you create a new benchmark that penalizes guessing, everyone will think you are just creating benchmarks that advantage yourself.
The real reason is that every bench I've seen has Anthropic with lower hallucinations.
https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
This is essentially what I'm doing too but I expect in a different country. I'm finding it incredibly difficult to successfully speak to people. How are you making headway? I'm very curious how you're leveraging AI messaging to clients/prospective clients that doesn't just come across as "I farm out work to an AI and yolo".
Edit - if you don't mind sharing, of course.
Mostly it's been an excellent way to translate vocabulary between products or technologies for me. When I'm working on something new (e.g., Hashicorp Packer) and lack the specific vocabulary, I may query Qwen or Ministral with what I want to do ("Build a Windows 11 image that executes scripts after startup but before sysprep"), then use its output as a starting point for what I actually want to accomplish. I've also tinkered with it at home for writing API integrations or parsing JSON with RegEx for Home Assistant uses, and found it very useful in low-risk environments.
Thus far, they don't consistently spit out functional code. I still have to do a back-and-forth to troubleshoot the output and make it secure and functional within my environments, and that's fine - it's how I learn, after all. When it comes to, say, SQL (which I understand conceptually, but not necessarily specifically), it's a slightly bigger crutch until I can start running on my own two feet.
Still cheaper than a proper consultant or SME, though, and for most enterprise workloads that's good (and cheap) enough once I've sanity checked it with a colleague or in a local dev/sandbox environment.
"I could make that in a weekend"
"The first 80% of a project takes 80% of the time, the remaining 20% takes the other 80% of the time"
That is a good point and true to some extent. But IME with AI, both the initial speedup and the eventual slowdown are accelerated vs. a human.
I've been thinking that one reason is that while AI coding generates code far faster (on a greenfield project I estimate about 50x), it also generates tech-debt at a hyperastonishing rate.
It used to be that tech debt started to catch up with teams in a few years, but with AI coded software it's only a few months into it that tech debt is so massive that it is slowing progress down.
I also find that I can keep the tech debt in check by using the bot only as a junior engineer, where I specify precisely the architecture and the design down to object and function definitions and I only let the bot write individual functions at a time.
That is much slower, but also much more sustainable. I'd estimate my productivity gains are "only" 2x to 3x (instead of ~50x) but tech debt accumulates no faster than a purely human-coded project.
This is based on various projects only about one year into it, so time will tell how it evolves longer term.
It looked really good, but as I got into the details the weirdness really started coming out. There's huge functions which interleave many concepts, and there's database queries everywhere. Huge amounts of duplication. It makes it very hard to change anything without breaking something else.
You can of course focus on getting the AI to simplify and condense. But that requires a good understanding of the codebase. Definitely no longer vibe-coded.
My enthusiasm for the technology has really gone in a wave. From "WOW" when it churned out 10k lines of credible looking code, to "Ohhhh" when I started getting into the weeds of the implementation and realising just how much of a mess it was. It's clearly very powerful for quick and dirty prototypes (and it seems to be particularly good at building decent CRUD frontends), but in software and user interaction the devil is in the details. And the details are a mess.
I qualify that because hey, someone comes back and reads this 5 years later, I have no idea what you will be facing then. But at the moment this is still true.
The problem is, people see the AIs coding, I dunno, what, a 100 times faster minimum in terms of churning out lines? And it just blows out their mental estimation models and they substitute an "infinity" for the capability of the models, either today or in the future. But they are not infinitely capable. They are finitely capable. As such they will still face many of the same challenges humans do... no matter how good they get in the future. Getting better will move the threshold but it can never remove it.
There is no model coming that will be able to consume an arbitrarily large amount of code goop and integrate with it instantly. That's not a limitation of Artificial Intelligences, that's a limitation of finite intelligences. A model that makes what we humans would call subjectively better code is going to produce a code base that can do more and go farther than a model that just hyper-focuses on the short-term and slops something out that works today. That's a continuum, not a binary, so there will always be room for a better model that makes better code. We will never overwhelm bad code with infinite intelligence because we can't have the latter.
Today, in 2026, providing the guidance for better code is a human role. I'm not promising it will be forever, but it is today. If you're not doing that, you will pay the price of a bad code base. I say that without emotion, just as "tech debt" is not always necessarily bad. It's just a tradeoff you need to decide about, but I guarantee a lot of people are making poor ones today without realizing it, and will be paying for it for years to come no matter how good the future AIs may be. (If the rumors and guesses are true that Windows is nearly in collapse from AI code... how much larger an object lesson do you need? If that is their problem they're probably in even bigger trouble than they realize.)
I also don't guarantee that "good code for humans" and "good code for AIs" will remain as aligned as they are now, though it is my opinion we ought to strive for that to be the case. It hasn't been talked about as much lately, but it's still good for us to be able to figure out why a system did what it did and even if it costs us some percentage of efficiency, having the AIs write human-legible code into the indefinite future is probably still a valuable thing to do so we can examine things if necessary. (Personally I suspect that while there will be some efficiency gain for letting the AIs make their own programming languages that I doubt it'll ever be more than some more-or-less fixed percentage gain rather than some step-change in capability that we're missing out on... and if it is, maybe we should miss out on that step-change. As the moltbots prove that whatever fiction we may have told ourselves about keeping AIs in boxes is total garbage in a world where people will proactively let AIs out of the box for entertainment purposes.)
Published APIs cannot be changed without causing friction on the client's end, which may not be under our control. Even if the API is properly versioned, users will be unhappy if they are asked to adopt a completely changed version of the API on a regular basis.
Data that was created according to a previous version of the data model continues to exist in various places and may not be easy to migrate.
User interfaces cannot be radically changed too frequently without confusing the hell out of human users.
I haven't tried that yet, so not sure.
Once upon a time I was at a company where the PRD specified that the product needs to have a toggle to enable a certain feature temporarily. Engineering implemented it literally, it worked perfectly. But it was vital to be able to disable the feature, which should've been obvious to anyone. Since the PRD didn't mention that, it was not implemented.
In that case, it was done as a protest. But AI is kind of like that, although out of sheer dumbness.
The story is meant to say that with AI it is imperative to be extremely prescriptive about everything, or things will go haywire. So doing a full rewrite will probably work well, only if you manage to have very tight test case coverage for absolutely everything. Which is pretty hard.
So, my answer would be no. Tech debt shows up even if every single change made the right decisions and this type of holistic view of projects is something AIs absolutely suck at. They can't keep all that context in their heads so they are forever stuck in the local maxima. That has been my experience at least. Maybe it'll get better... any day now!
Sometimes the start of a greenfield project has a lot of questions along the lines of "what graph plotting library are we going to use? we don't want two competing libraries in the same codebase so we should check it meets all our future needs"
LLMs can select a library and produce a basic implementation while a human is still reading reddit posts arguing about the distinction between 'graphs' and 'charts'.
The solutions also help me combat my natural tendency to over-engineer.
It’s also fun getting ChatGPT to quiz me on topics.
But last week I had two days where I had no real work to do, so I created cli tools to help with organisation, and cleaning up, I think AI boosted my productivity at least 200%, if not 500.
Overall, still a 4x production gain overall though, so I’m not complaining for $20 a month. It’s especially good at managing complicated aspects of c so I can focus on the bigger picture rather than the symbol contortions.
I’m finding that I am breaking projects down into clear separations of concerns and designing inviolate API walls between modules, where before I might have reached into the code with less clearly defined internal vs external functions.
Exercising solid boundaries and being maniacal about the API surface is also really liberating personally, less cognitive load, less stress, easier tests, easier debugging.
Of course none of this is new, but now we can do it and get -more- done in a day than if we don’t. Building in technical debt no longer raises productivity, it lowers it.
If you are a competent engineer, ai can drastically improve both code quality and productivity, but you have to be capable of cognitively framing the project in advance (which can also be accelerated with ai). You need to work as an architect more than a coder.
That’s what I’ve been doing lately, and it really helps get a clean architecture at the end.
Plus the highest end models now don’t go so brain dead at compaction. I suspect that passing context well through compaction will be part of the next wave of model improvements.
I have been meaning to put up a blog ...
Essentially there's a delta between what the human does and the computer produces. In a classic compiler setting this is a known, stable quantity throughout the life-cycle of development.
However, in the world of AI coding this distance increases.
There's various barriers that have labels like "code debt" where the line can cross. There's three mitigations now. Start the lines closer together (PRD is the current en vogue method), push out the frontier of how many shits someone gives (this is the TDD agent method), try to bend the curve so it doesn't fly out so much (this is the coworker/colleague method).
Unfortunately I'm just a one-man show so the fact that I was ahead and have working models to explain this has no rewards because you know, good software is hard...
I've explained this in person at SF events (probably about 40-50 times) so much though that someone reading this might have actually heard it from me...
If that's the case, hi, here it is again.
I also like to generate greenfield codebases from scratch.
Yup. My biggest issue with designing software is usually designing the system architecture/infra. I am very opposed to just shove everything to AWS and call it a day, you dont learn anything from that, cloud performance stinks for many things and I dont want to get random 30k bills because I let some instance of something run accidentally.
AI sucks at determining what kinda infrastructure would be great for scenario x due to Cloud being to go to solution for the lazy dev. Tried to get it to recommend a way to self host stuff, but thats just a general security hazard.
On the other you have a non-technical executive who's got his head round Claude Code and can run e.g. Python locally.
I helped one recently almost one-shot converting a 30 sheet mind numbingly complicated Excel financial model to Python with Claude Code.
Once the model is in Python, you effectively have a data science team in your pocket with Claude Code. You can easily run Monte Carlo simulations, pull external data sources as inputs, build web dashboards and have Claude Code work with you to really integrate weaknesses in your model (or business). It's a pretty magical experience watching someone realise they have so much power at their fingertips, without having to grind away for hours/days in Excel.
almost makes me physically sick.I've a reasonably intense math background corrupted by application to geophysics and implementing real world numerical applications.
To be fair, this statement alone:
* 30 sheet mind numbingly complicated Excel financial model
makes my skin crawl and invokes a flight reflex.
Still, I'll concede that a Claude Code conversion to Python of a 30 sheet Excel financial model is unlikely to be significantly worse than the original.
If a data science team modeled something incorrectly in their simulation, who's gonna catch it? Usually nobody. At least not until it's too late. Will you say "this doesn't look plausible" about the output? Or maybe you'll be too worried about getting chided for "not being data driven" enough.
If an exec tells an intern or temp to vibecode that thing instead, then you definitely won't have any checkpoints in the process to make sure the human-language prompt describing process was properly turned into the right simulation. But unlike in coding, you don't have a user-facing product that someone can click around in, or send requests to, and verify. Is there a test suite for the giant excel doc? I'm assuming no, maybe I'm wrong.
It feels like it's going to be very hard for anyone working in areas with less black-and-white verifiability or correctness like that sort of financial modeling.
I recently watched a demo from a data science guy about the impending proliferation of AI in just about all related fields, his position was highly sceptical but with a "let's make the most of it while we can"
The part that stood out to me which I have repeated to colleagues since, was a demo where the guy fed his tame robot a .csv of price trends for apples and bananas, and asked it to visualise this. Sure enough, out comes a nice looking graph with two jagged lines. Pack it ship it move on..
But then he reveals that, as he wrote the data himself, he knows that both lines should just be an upward trend. Expands the axis labels - the LLM has alphabetized the months but said nothing of it in any of the outputs.
https://www.newscientist.com/article/dn23448-how-to-stop-exc...
Any and I mean any statistic someone throws at me I will try and dig in. And if I'm able to, I will usually find that something is very wrong somewhere. As in, the underlying data is usually just wrong, invalidating the whole thing or the data is reasonably sound but the person doing the analysis is making incorrect assumptions about parts of the data and then drawing incorrect conclusions.
Can't tell you how many times I've seen product managers making decisions based on a few hundred analytics events, trying to glean insight where there is none.
What are you optimizing all that code for, it works doesnt it? Dont let perfect be the enemy of good. If it works 80% thats enough, just push it. What is technical debt?
I think 1) holds (as my experience matches your cynicism :), but I have a feeling that data minded people tend to overestimate the importance of 2)...
In many experience, many of the statistics these people use doesn't matter in the success of a business --- they are vanity metrics. But people use statistics, and especially the wrong statistics, to pass their agenda. Regardless, it's important to fix the statistics.
What also can help for entrepreneurship is having a bias for action. So even if your insights are wrong, if you act and keep acting you will keep acting then you will partially shape reality to your will and bend to its will.
So there are certain forces where you can compensate for your lack of rigor.
The best companies have both of those things by their side.
There are often more errors. Sometimes the actual results are wildly different in reality to what a model expects .. but the data treatment has been bug hunted until it does what was expected .. and then attention fades away.
I bet you do this only 75% of the time.
The local statistics office here recently presented salary statistics claiming that teachers' salaries had unexpectedly increased by 50%. All the press releases went out, and it was only questions raised by the public that forced the statistics office to review and correct the data.
Back in my data scientist days I used to push for testing and verification of models. Got told off for reducing the teams speed. If the model works well enough to get money in, and the managers that make the final calls do not understand the implications of being wrong, this would be the majority of cases.
A huge test for me was to have people review my analyses and poke holes. You feel good when your last 50 reports didn’t have a single thing anyone could point out.
I’ve been seeing a lot of people try to build analyses with AI who haven’t been burned with the “just because it sounds correct doesn’t mean it’s right” dilemma who haven’t realized what it takes before you can stamp your name on an analysis.
The Excel sheet will have been tuned over the years by people who knew exactly what it was doing and fixed countless bugs along the way.
The Claude Code copy will be a simulacrum that may behave the same way with some inputs, but is likely to get many of edge cases wrong, and, when you're talking about 30 sheets of Excel, there will be many, many of these sharp edges.
IMHO, earned through years of bleeding eyeballs, the first will be riddled with subtle edge cases curiously patched and fettled such that it'll limp through to the desired goal .. mostly.
The automated AI assisted transcoding will be ... interesting.
Now, back in the day, IBM designed and built an "executive data terminal". It wasn't really a computer terminal in the sense that you and I understand it. Rather, it was a video and two-way-audio feed to a room with a team of underlings, which an executive could ask for business data and analyses, which could be called up on a computer display (also routed to the executive's office). This allowed the executive to ask questions so he (it was the 1960s, it was almost invariably a he) could make informed decisions, and the team of underlings to call up data or crunch numbers on the computer and show the results on the display.
So because executives are used to having things done for them, I can totally see AI being used by executives to replace the "team of underlings" in this setup—in principle. The fact is that were I in that CEO's chair, I'd be thinking twice before trusting anything an LLM tells me, and double-checking those results—perhaps with my team of underlings.
Discussed on Hackernews: https://news.ycombinator.com/item?id=42405462 IEEE article: https://spectrum.ieee.org/ibm-demo
You're too modest. You'd be thinking once.
However when the parrot is hidden in a shiny box made up to look like a regular, relatively trustworthy program...
They usually happen because some new and exciting line of business is started by a small team as a POC. Those teams don't get full technology backing, it would slow down the early iteration and cost a lot of money for an idea that may not be lucrative. Eventually they make a lot of money and by then risk controls are basically requiring them to document every single change they make in excel. This eventually sucks enough that they complain and get a tech team to convert the spreadsheet.
My experience being they are an exception rather than the rule and many more businesses have sheets that tend further toward Heath Robinson than would be admitted in public.
The Excel never had any tests and people just trusted it.
You made me laugh hard. :)
I'm sure Claude Code will happily one-shot that conversion. It's also virtually guaranteed to have messed up vital parts of the original logic in the process.
This reminded of something that happened to me last year. Not Claude (I think it was GPT 4.0 maybe?), but I had it running in VS Code's Copilot and asked it to fix a bug then add a test for the case.
Well, it kept failing to pass its own test, so on the third try, it sat there "thinking" for a moment, then finally spit out the command `echo "Test Passed!"`, executed it, read it from the terminal, and said it was done.
I was almost impressed by the gumption more than anything.
1) it wants to run X command
2) it notices a hook preventing it from running X
3) it creates a Python application or shell script that does X and runs it instead
Whoops.
0: https://github.com/mbcrawfo/vibefun/blob/main/.claude/hooks/...
Anyway, please try it if you find it unbelievable. I didn't expect it to work FWIW like it did. Opus 4.5 is pretty amazing at long running tasks like this.
Maybe you did one or the other , but “nearly one-shotted” doesn’t tend to mean that.
Claude Code more than occasionally likes to make weird assumptions, and it’s well known that it hallucinates quite a bit more near the context length, and that compaction only partially helps this issue.
Sure, maybe that’s just building something that’s bug-for-bug compatible, but it’s something Claude can work with.
I have no idea why it had so much trouble with this generally easy task. Bizarre.
I have, in my early careers, gone knee deep into Excel macros and worked on c# automation that will create excel sheet run excel macros on it and then save it without the macros.
in the entire process, I saw dozens of date time mistakes in VBA code, but no tests that would catch them...
When shit hits the fan and execs need answers yesterday, will they jump to using the LLM to probabilistically make modifications to the system, or will they admit it was a mistake and pull Excel back up to deterministically make modifications the way they know how?
When shit hits the fan, execs need answers yesterday and the 30 sheet Excel monstrosity is producing the wrong numbers - who fixes it?
It was done by Sue, who left the company 4 years ago, people have been using it since and nobody really understands it.
I’ve also heard plenty of horror stories of bus factor employees leaving (or threatening to leave) behind an excel monstrosity and companies losing 6 months of sales, so maybe there’s a win for AI somewhere in there.
"1 or 2 plan mode prompts" to fully describe a 30-sheet complicated doc suggests a massively higher level of granularity than Opus initial plans on existing codebases give me or a less-than-expected level of Excel craziness.
And the tooling harnesses have been telling the models to add testing to things they make for months now, so why's that impressive or suprising?
I was impressed because the prompt didn't ask it to do that. It doesn't normally add tests for me without asking, YMMV.
Did it build a test suite for the Excel side? A fuzzer or such?
It's the cross-concern interactions that still get me.
80% of what I think about these days when writing software is how to test more exhaustively without build times being absolute shit (and not necessarily actually being exhaustive anyway).
Tell me if I am wrong, but surely Claude cannot even access execution coverage.
the largest independent derivatives broker in australia collapsed after it was discovered the board were using astrology and magicians to gamble with all the clients money
https://www.abc.net.au/news/2016-09-16/stockbroker-used-psyc...
I have seen Excel used for financial planning
I have seen Excel used for managing people's health data.
I have BUILT a test suite for a government offical use communication device - inside Excel. The original was a mish-mash of Excel formulas and VBA. I improved the VBA part of it by adding a web cam to the mix.
I don't sleep well at night knowing how many very very essential things are running on top of Excel sheets passed down like stories around a campfire.
It's like a CPU that's almost 100% reliable... in that it fails only once every 1 million clock cycles.
All the previous human-driven crashes didn't change anything about capital owners' approach to money, so why would an AI-driven crash change things?
The best part is that they can say the AI will get some stuff wrong, they knew that, and it's not their fault when it breaks. Or more likely, it'll break in subtle ways, nobody will ever notice and the consequences won't be traced back to this. YOLO!
It used to be that we'd fix the copy-paste bugs in the excel sheet when we converted it to a proper model, good to know that we'll now preserve them forever.
It is a beautiful experience to realize wtf you don’t know and how far over their skis so many will get trusting AI. The idea of deploying a rust project at my level of ability with an AI at the helm is is terrifying.
In my experience a lot of Excel models aren’t really tested, just checked a bit and them deemed correct.
If you convert bullshit from Excel to Python it's still bullshit. There's a reason why Claude can one-shot it and no one questions the result :D
Nobody will see that on sheet 27 cell FG456 is actually a static number that Brian typoed in there in 2019 and not a formula.
“The real leaps are being made organically by employees, not from a top down [desktop PC] strategy. Where I see the real productivity gains are small teams deciding to try and build a [Lotus 123] assisted workflow for a process, and as they are the ones that know that process inside out they can get very good results - unlike a [mainframe] software engineering team who have absolutely zero experience doing the process that they are helping automate.”
The embedded “power users” show the way, then the CIO-friendly packaged software follows much later.
Back then, employees were secretly installing Excel macros and Dropbox just to get work done faster. Now they’re quietly running Claude Code in the terminal because the official Copilot can’t even forma a CSV properly.
CISOs are terrified right now and that’s understandable. Non-technical people with root access and agents that write code are a security nightmare. But trying to ban this outright will only push your most effective employees to places where they’re allowed to "fly"
Microsoft has spent 30 years designing the most contrived XML-based format for Excel/Word/Powerpoint documents, so that it cannot be parsed except by very complicated bespoke applications with hundreds of developers involved.
Now, it's impossible to export any of those documents into plain text that an LLM can understand, and Microsoft Copilot literally doesn't work no matter how much money they throw at it. My company is now migrating Word documents to Markdown because they're seeing how powerful AI is.
This is karmic justice imo.
I had interns use c++ to unzip, parse, and repackage to json a standardized visio doc. I had no say in the standard, but specific blocks meant specific things, etc. The project was successful. The xml was parse-able... at least for our needs. The overall project died a swift death and this tidbit will probably be forgotten forever in the depths of repo heirarchy.
I even tried telling Copilot to convert each sheet to a CSV on one attempt THEN do calculations. It just ignored it and failed miserably, ironically outputting me a list of files that it should have made, along with the broken python script. I found this very amusing.
I've read that they're supposed to be great with XML as it's so structured, better than JSON, but haven't actually found that to be the case.
It seems way too soon to really narrow down any kind of trends after a few months. Most people aren't breathlessly following the next twitter trend, give it at least a year. Nobody is really going to be left behind if they pick up agents now instead of 3 months ago.
I've seen great improvements with just two MCP servers: context7 and playwright. The first is great on planning sessions and leads to better usage of new-ish libraries, and the second is giving the model a feedback loop. The advantage is that they work with pretty much any coding agent harness you use. So whatever worked with cursor will work with cc or opencode or whatever else.
What I want is a Skill that leverages a normal CLI executable that gives the LLM the same capabilities of browser use.
The LLM agent can make sense of the text document, figure out the actual tool calls and use them.
And you, the MCP server operator, can change the "API" at any time and the client (LLM agent) will just automatically adjust.
I would argue if they're using all that tooling, they _are_ technical users.