My AI Adoption Journey

Posted by anurag 7 hours ago

326 points | 84 comments

libraryofbabel 3 hours ago|

This is such a lovely balanced thoughtful refreshingly hype-free post to read. 2025 really was the year when things shifted and many first-rate developers (often previously AI skeptics, as Mitchell was) found the tools had actually got good enough that they could incorporate AI agents into their workflows.

It's a shame that AI coding tools have become such a polarizing issue among developers. I understand the reasons, but I wish there had been a smoother path to this future. The early LLMs like GPT-3 could sort of code enough for it to look like there was a lot of potential, and so there was a lot of hype to drum up investment and a lot of promises made that weren't really viable with the tech as it was then. This created a large number of AI skeptics (of whom I was one, for a while) and a whole bunch of cynicism and suspicion and resistance amongst a large swathe of developers. But could it have been different? It seems a lot of transformative new tech is fated to evolve this way. Early aircraft were extremely unreliable and dangerous and not yet worthy of the promises being made about them, but eventually with enough evolution and lessons learned we got the Douglas DC-3, and then in the end the 747.

If you're a developer who still doesn't believe that AI tools are useful, I would recommend you go read Mitchell's post, and give Claude Code a trial run like he did. Try and forget about the annoying hype and the vibe-coding influencers and the noise and just treat it like any new tool you might put through its paces. There are many important conversations about AI to be had, it has plenty of downsides, but a proper discussion begins with close engagement with the tools.

keyle 1 hour ago||

Architects went from drawing everything on paper, to using CAD products over a generation. That's a lot of years! They're still called architects.

Our tooling just had a refresh in less than 3 years and it leaves heads spinning. People are confused, fighting for or against it. Torn even between 2025 to 2026. I know I was.

People need a way to describe it from 'agentic coding' to 'vibe coding' to 'modern AI assisted stack'.

We don't call architects 'vibe architects' even though they copy-paste 4/5th of your next house and use a library of things in their work!

We don't call builders 'vibe builders' for using earth-moving machines instead of a shovel...

When was the last time you reviewed the machine code produced by a compiler? ...

The real issue this industry is facing, is the phenomenal speed of change. But what are we really doing? That's right, programming.

datsci_est_2015 15 minutes ago|||

I skimmed over it, and didn’t find any discussion of:

  - Pull requests
  - Merge requests
  - Code review

I feel like I’m taking crazy pills. Are SWE supposed to move away from code review, one of the core activities for the profession? Code review is as fundamental for SWE as double entry is for accounting.

Yes, we know that functional code can get generated at incredible speeds. Yes, we know that apps and what not can be bootstrapped from nothing by “agentic coding”.

We need to read this code, right? How can I deliver code to my company without security and reliability guarantees that, at their core, come from me knowing what I’m delivering line-by-line?

majormajor 49 minutes ago|||

GPT-4 showed the potential but the automated workflows (context management, loops, test-running) and pure execution speed to handle all that "reasoning"/workflows (remember watching characters pop in slowly in GPT-4 streaming API response calls) are gamechangers.

The workflow automation and better (and model-directed) context management are all obvious in retrospect but a lot of people (like myself) were instead focused on IDE integration and such vs `grep` and the like. Maybe multi-agent with task boards is the next thing, but it feels like that might also start to outrun the ability to sensibly design and test new features for non-greenfield/non-port projects. Who knows yet.

I think it's still very valuable for someone to dig in to the underlying models periodically (insomuch as the APIs even expose the same level of raw stuff anymore) to get a feeling for what's reliable to one-shot vs what's easily correctable by a "ran the tests, saw it was wrong, fixed it" loop. If you don't have a good sense of that, it's easy to get overambitious and end up with something you don't like if you're the sort of person who cares at all about what the code looks like.

beoberha 3 hours ago|||

Your sentiment resonates with me a lot. I wonder what we’ll consider the inflection point 10 years from now. It seemed like the zeitgeist was screaming about scaling limits and running out of training data, then we got Claude code, sonnet 4.5, then Opus 4.5 and no ones looked back since.

libraryofbabel 2 hours ago||

I wonder too. It might be that progress on the underlying models is going to plateau, or it might be that we haven't yet reached what in retrospect will be the biggest inflection point. Technological developments can seem to make sense in hindsight as a story of continuous progress when the dust has settled and we can write and tell the history, but when you go back and look at the full range of voices in the historical sources you realize just how deeply nothing was clear to anyone at all at the time it was happening because everyone was hurtling into the unknown future with a fog of war in front of them. In 1910 I'd say it would have been perfectly reasonable to predict airplanes would remain a terrifying curiosity reserved for daredevils only (and people did); or conversely, in the 1960s a lot of commentators thought that the future of passenger air travel in the 70s and 80s would be supersonic jets. I keep this in mind and don't really pay too much attention to over-confident predictions about the technological future.

Lich 2 hours ago|||

Is there any reason to use Claude Code specifically over Codex or Gemini? I’ve found the both Codex and Gemini similar in results, but I never tried Claude because of I keep hearing usage runs out so fast on pro plans and there’s no free trial for the CLI.

libraryofbabel 2 hours ago|||

I mostly mentioned Claude Code because it's what Mitchell first tried according to his post, and it's what I personally use. From what I hear Codex is pretty comparable; it has a lot of fans. There are definitely some differences and strengths and weaknesses of both the CLIs and the underlying LLMs that others who use more than one tool might want to weight in on, but they're all fairly comparable. (Although, we'll see how the new models released from Anthropic and OpenAI today stack up.) Codex and Gemini CLI are basically Claude Code clones with different LLMs behind them, after all.

majormajor 47 minutes ago|||

IME Gemini is pretty slow in comparison to Claude - but hey, it's super cheap at least.

But that speed makes a pretty significant difference in experience.

If you wait a couple minutes and then give the model a bunch of feedback about what you want done differently, and then have to wait again, it gets annoying fast.

If the feedback loop is much tighter things feel much more engaging. Cursor is also good at this (investigate and plan using slower/pricier models, implement using fast+cheap ones).

arcxi 3 hours ago|||

but annoying hype is exactly the issue with AI in my eyes. I get it's a useful tool in moderation and all, but I also experience that management values speed and quantity of delivery above all else, and hype-driven as they are I fear they will run this industry to the ground and we as users and customers will have to deal with the world where software is permanently broken as a giant pile of unmaintainable vibe code and no experienced junior developers to boot.

whatifnomoney 3 hours ago||

[dead]

mjr00 5 hours ago||

> Break down sessions into separate clear, actionable tasks. Don't try to "draw the owl" in one mega session.

This is the key one I think. At one extreme you can tell an agent "write a for loop that iterates over the variable `numbers` and computes the sum" and they'll do this successfully, but the scope is so small there's not much point in using an LLM. On the other extreme you can tell an agent "make me an app that's Facebook for dogs" and it'll make so many assumptions about the architecture, code and product that there's no chance it produces anything useful beyond a cool prototype to show mom and dad.

A lot of successful LLM adoption for code is finding this sweet spot. Overly specific instructions don't make you feel productive, and overly broad instructions you end up redoing too much of the work.

sho_hn 5 hours ago||

This is actually an aspect of using AI tools I really enjoy: Forming an educated intuition about what the tool is good at, and tastefully framing and scoping the tasks I give it to get better results.

It cognitively feels very similar to other classic programming activities, like modularization at any level from architecture to code units/functions, thoughtfully choosing how to lay out and chunk things. It's always been one of the things that make programming pleasurable for me, and some of that feeling returns when slicing up tasks for agents.

allenu 4 hours ago|||

I agree that framing and scoping tasks is becoming a real joy. The great thing about this strategy is there's a point at which you can scope something small enough that it's hard for the AI to get it wrong and it's easy enough for you as a human to comprehend what it's done and verify that it's correct.

I'm starting to think of projects now as a tree structure where the overall architecture of the system is the main trunk and from there you have the sub-modules, and eventually you get to implementations of functions and classes. The goal of the human in working with the coding agent is to have full editorial control of the main trunk and main sub-modules and delegate as much of the smaller branches as possible.

Sometimes you're still working out the higher-level architecture, too, and you can use the agent to prototype the smaller bits and pieces which will inform the decisions you make about how the higher-level stuff should operate.

audience_mem 3 hours ago||

[Edit: I may have been replying to another comment in my head as now I re-read it and I'm not sure I've said the same thing as you have. Oh well.]

I agree. This is how I see it too. It's more like a shortcut to an end result that's very similar (or much better) than I would've reached through typing it myself.

The other day I did realise that I'm using my experience to steer it away from bad decisions a lot more than I noticed. It feels like it does all the real work, but I have to remember it's my/our (decades of) experience writing code playing a part also.

I'm genuinely confused when people come in at this point and say that it's impossible to do this and produce good output and end results.

meowface 2 hours ago|||

I feel the same, but, also, within like three years this might look very different. Maybe you'll give the full end-to-end goal upfront and it just polls you when it needs clarification or wants to suggest alternatives, and it self-manages cleanly self-delegating.

Or maybe something quite different but where these early era agentic tooling strategies still become either unneeded or even actively detrimental.

zxor 2 hours ago||

> it just polls you when it needs clarification

I think anyone who has worked on a serious software project would say, this means it would be polling you constantly.

Even if we posit that an LLM is equivalent to a human, humans constantly clarify requirements/architecture. IMO on both of those fronts the correct path often reveals itself over time, rather than being knowable from the start.

So in this scenario it seems like you'd be dealing with constant pings and need to really make sure you're understanding of the project is growing with the LLM's development efforts as well.

To me this seems like the best-case of the current technology, the models have been getting better and better at doing what you tell it in small chunks but you still need to be deciding what it should be doing. These chunks don't feel as though they're getting bigger unless you're willing to accept slop.

mapontosevenths 18 minutes ago|||

> Break down sessions into separate clear, actionable tasks.

What this misses, of course, is that you can just have the agent do this too. Agent's are great at making project plans, especially if you give them a template to follow.

iamacyborg 4 hours ago|||

> On the other extreme you can tell an agent "make me an app that's Facebook for dogs" and it'll make so many assumptions about the architecture, code and product that there's no chance it produces anything useful beyond a cool prototype to show mom and dad.

Amusingly, this was my experience in giving Lovable a shot. The onboarding process was literally just setting me up for failure by asking me to describe the detailed app I was attempting to build.

Taking it piece by piece in Claude Code has been significantly more successful.

jedbrooke 5 hours ago|||

so many times I catch myself asking a coding agent e.g “please print the output” and it will update the file with “print (output)”.

Maybe there’s something about not having to context switch between natural language and code just makes it _feel_ easier sometimes

kcorbitt 3 hours ago|||

And lately, the sweet spot has been moving upwards every 6-8 weeks with the model release cycle.

apercu 4 hours ago|||

I actually enjoy writing specifications. So much so that I made it a large part of my consulting work for a huge part of my career. SO it makes sense that working with Gen-AI that way is enjoyable for me.

The more detailed I am in breaking down chunks, the easier it is for me to verify and the more likely I am going to get output that isn't 30% wrong.

oulipo2 4 hours ago||

Exactly. The LLMs are quite good at "code inpainting", eg "give me the outline/constraints/rules and I'll fill-in the blanks"

But not so good at making (robust) new features out of the blue

EastLondonCoder 5 hours ago||

This matches my experience, especially "don’t draw the owl" and the harness-engineering idea.

The failure mode I kept hitting wasn’t just "it makes mistakes", it was drift: it can stay locally plausible while slowly walking away from the real constraints of the repo. The output still sounds confident, so you don’t notice until you run into reality (tests, runtime behaviour, perf, ops, UX).

What ended up working for me was treating chat as where I shape the plan (tradeoffs, invariants, failure modes) and treating the agent as something that does narrow, reviewable diffs against that plan. The human job stays very boring: run it, verify it, and decide what’s actually acceptable. That separation is what made it click for me.

Once I got that loop stable, it stopped being a toy and started being a lever. I’ve shipped real features this way across a few projects (a git like tool for heavy media projects, a ticketing/payment flow with real users, a local-first genealogy tool, and a small CMS/publishing pipeline). The common thread is the same: small diffs, fast verification, and continuously tightening the harness so the agent can’t drift unnoticed.

protocolture 1 hour ago||

>The failure mode I kept hitting wasn’t just "it makes mistakes", it was drift: it can stay locally plausible while slowly walking away from the real constraints of the repo. The output still sounds confident, so you don’t notice until you run into reality (tests, runtime behaviour, perf, ops, UX).

Yeah I would get patterns where, initial prototypes were promising, then we developed something that was 90% close to design goals, and then as we try to push in the last 10%, drift would start breaking down, or even just forgetting, the 90%.

So I would start getting to 90% and basically starting a new project with that as the baseline to add to.

ricardobeat 2 hours ago|||

No harm meant, but your writing is very reminiscent of an LLM. It is great actually, there is just something about it - "it wasn't.. it was", "it stopped being.. and started". Claude and ChatGPT seem to love these juxtapositions. The triplets on every other sentence. I think you are a couple em-dashes away from being accused of being a bot.

These patterns seem to be picking up speed in the general population; makes the human race seem quite easily hackable.

pixl97 13 minutes ago||

>makes the human race seem quite easily hackable.

If the human race were not hackable then society would not exist, we'd be the unchanging crocodiles of the last few hundred million years.

Have you ever found yourself speaking a meme? Had a catchy toon repeating in your head? Started spouting nation state level propaganda? Found yourself in crowd trying to burn a witch at the stake?

Hacking the flow of human thought isn't that hard, especially across populations. Hacking any one particular humans thoughts is harder unless you have a lot of information on them.

bdangubic 4 hours ago||

This is the most common answer from people that are rocking and rolling with AI tools but I cannot help but wonder how is this different from how we should have built software all along. I know I have been (after 10+ years…)

EastLondonCoder 4 hours ago||

I think you are right, the secret is that there is no secret. The projects I have been involved with thats been most successful was using these techniques. I also think experience helps because you develop a sense that very quickly knows if the model wants to go in a wonky direction and how a good spec looks like.

With where the models are right now you still need a human in the loop to make sure you end up with code you (and your organisation) actually understands. The bottle neck has gone from writing code to reading code.

sksisksbbs 2 hours ago||

> The bottle neck has gone from writing code to reading code.

This has always been the bottleneck. Reviewing code is much harder and gets worse results than writing it, which is why reviewing AI code is not very efficient. The time required to understand code far outstrips the time to type it.

Most devs don’t do thorough reviews. Check the variable names seem ok, make sure there’s no obvious typos, ask for a comment and call it good. For a trusted teammate this is actually ok and why they’re so valuable! For an AI, it’s a slot machine and trusting it is equivalent to letting your coworkers/users do your job so you can personally move faster.

scarrilho 1 hour ago||

With so much noise in the AI world and constant model updates (just today GPT-5.3-Codex and Claude Opus 4.6 were announced), this was a really refreshing read. It’s easy to relate to his phased approach to finding real value in tooling and not just hype. There are solid insights and practical tips here. I’m increasingly convinced that the best way not to get overwhelmed is to set clear expectations for what you want to achieve with AI and tailor how you use it to work for you, rather than trying to chase every new headline. Very refreshing.

bthornbury 22 minutes ago||

AI is getting to the game-changing point. We need more hand-written reflections on how individuals are managing to get productivity gains for real (not a vibe coded app) software engineering.

sho_hn 5 hours ago||

Much more pragmatic and less performative than other posts hitting frontpage. Good article.

alterom 5 hours ago|

Finally, a step-by-step guide for even the skeptics to try to see what spot the LLM tools have in their workflows, without hype or magic like I vibe-coded an entire OS, and you can too!.

rthak 6 minutes ago||

Now that the Nasdaq crashes, people switch from the stick to the carrot:

"Please let us sit down and have a reasonable conversation! I was a skeptic, too, but if all skeptics did what I did, they would come to Jesus as well! Oh, and pay the monthly Anthropic tithe!"

keyle 3 hours ago||

It's amusing how everyone seems to be going through the same journey.

I do run multiple models at once now. On different parts of the code base.

I focus solely on the less boring tasks for myself and outsource all of the slam dunk and then review. Often use another model to validate the previous models work while doing so myself.

I do git reset still quite often but I find more ways to not get to that point by knowing the tools better and better.

Autocompleting our brains! What a crazy time.

e40 1 hour ago||

For those of working on large proprietary, in fringe languages as well, what can we do? Upload all the source code to the cloud model? I am really wary of giving it a million lines of code it’s never seen.

senko 4 hours ago|

For those wondering how that looks in practice, here's one of OP's past blog posts describing a coding session to implement a non-trivial feature: https://mitchellh.com/writing/non-trivial-vibing (covered on HN here: https://news.ycombinator.com/item?id=45549434)

More comments...