Posted by bigwheels 1 day ago
Somewhere, there are GPUs/NPUs running hot. You send all the necessary data, including information that you would never otherwise share. And you most likely do not pay the actual costs. It might become cheaper or it might not, because reasoning is a sticking plaster on the accuracy problem. You and your business become dependent on this major gatekeeper. It may seem like a good trade-off today. However, the personal, professional, political and societal issues will become increasingly difficult to overlook.
Then even if you do catch it, AI: "ah, now I see exactly the problem. just insert a few more coins and I'll fix it for real this time, I promise!"
If it does not, this is going to be first technology in the history of mankind that has not become cheaper.
(But anyway, it already costs half compared to last year)
check out whether clocks have gotten cheaper in general. the answer is that it has.
there is no economy of scale here in repairing a single clock. its not relevant to bring it up here.
Getting a bespoke flintstone axe is also pretty expensive, and has also absolutely no relevance to modern life.
These discussions must, if they are to be useful, center in a population experience, not in unique personal moments.
You could not have bought Claude Opus 4.5 at any price one year ago I'm quite certain. The things that were available cost half of what they did then, and there are new things available. These are both true.
I'm agreeing with you, to be clear.
There are two pieces I expect to continue: inference for existing models will continue to get cheaper. Models will continue to get better.
Three things, actually.
The "hitting a wall" / "plateau" people will continue to be loud and wrong. Just as they have been since 2018[0].
[0]: https://blog.irvingwb.com/blog/2018/09/a-critical-appraisal-...
This is harmless when it comes to tech opinions but causes real damage in politics and activism.
People get really attached to ideals and ideas, and keep sticking to those after they fail to work again and again.
this is accounting for the fact that more tokens are used.
[1]: https://developer-blogs.nvidia.com/wp-content/uploads/2026/0...
If current LLMs are ever deployed in systems harboring the big red button, they WILL most definitely somehow press that button.
If instead we believe in fantasies of a single all-knowing machine god that is 100% correct at all times, then... we really just have ourselves to blame. Might as well just have spammed that button by hand.
I’ve always said I’m a builder even though I’ve also enjoyed programming (but for an outcome, never for the sake of the code)
This perfectly sums up what I’ve been observing between people like me (builders) who are ecstatic about this new world and programmers who talk about the craft of programming, sometimes butting heads.
One viewpoint isn’t necessarily more valid, just a difference of wiring.
"I got into programming because I like programming, not whatever this is..."
Yes, I'm building stupid things faster, but I didn't get into programming because I wanted to build tons of things. I got into it for the thrill of defining a problem in terms of data structures and instructions a computer could understand, entering those instructions into the computer, and then watching victoriously while those instructions were executed.
If I was intellectually excited about telling something to do this for me, I'd have gotten into management.
> with AI that can happen faster.
well, not exactly that.
So maybe our common ground is that we are direct problem solvers. :-)
And accountability can still exist? Is the engineer that created or reviewed a Pull Request using Claude Code less accountable then one that used PICO?
The point is that in the human scenario, you can hold the human agents accountable. You cannot do that with AI. Of course, you as the orchestrator of agents will be accountable to someone, but you won't have the benefit of holding your "subordinates" accountable, which is what you do in a human team. IMO, this renders the whole situation vastly different (whether good or bad I'm not sure).
What I mean by that: you had compiled vs interpreted languages, you had types vs untyped, testing strategies, all that, at least in some part, was a conversation about the tradeoffs between moving fast/shipping and maintainability.
But it isn't just tech, it is also in methodologies and the words use, from "build fast and break things" and "yagni" to "design patterns" and "abstractions"
As you say, it is a different viewpoint... but my biggest concern with where are as industry is that these are not just "equally valid" viewpoints of how to build software... it is quite literally different stages of software, that, AFAICT, pretty much all successful software has to go through.
Much of my career has been spent in teams at companies with products that are undergoing the transition from "hip app built by scrappy team" to "profitable, reliable software" and it is painful. Going from something where you have 5 people who know all the ins and outs and can fix serious bugs or ship features in a few days to something that has easy clean boundaries to scale to 100 engineers of a wide range of familiarities with the tech, the problem domain, skill levels, and opinions is just really hard. I am not convinced yet that AI will solve the problem, and I am also unsure it doesn't risk making it worse (at least in the short term)
Much of my career has been spent in teams at companies with products that are undergoing the transition from "hip app built by scrappy team" to "profitable, reliable software" and it is painful. Going from something where you have 5 people who know all the ins and outs and can fix serious bugs or ship features in a few days to something that has easy clean boundaries to scale to 100 engineers of a wide range of familiarities with the tech, the problem domain, skill levels, and opinions is just really hard. I am not convinced yet that AI will solve the problem, and I am also unsure it doesn't risk making it worse (at least in the short term)
“””
This perspective is crucial. Scale is the great equalizer / demoralizer, scale of the org and scale of the systems. Systems become complex quickly, and verifiability of correctness and function becomes harder. Companies that built from day with AI and have AI influencing them as they scale, where does complexity begin to run up against the limitations of AI and cause regression? Or if all goes well, amplification?
This distinction to me separates the two primary camps
Managers and project managers are valuable roles and have important skill sets. But there's really very little connection with the role of software development that used to exist.
It's a bit odd to me to include both of these roles under a single label of "builders", as they have so little in common.
EDIT: this goes into more detail about how coding (and soon other kinds of knowledge work) is just a management task now: https://www.oneusefulthing.org/p/management-as-ai-superpower...
I deliberately avoid full vibe coding since I think doing so will rust my skills as a programmer. It also really doesn’t save much time in my experience. Once I have a design in mind, implementation is not the hard part.
I had felt like this and still do but man, at some point, I feel like the management churn feels real & I just feel suffering from a new problem.
Suppose, I actually end up having services literally deployed from a single prompt nothing else. Earlier I used to have AI write code but I was interested in the deployment and everything around it, now there are services which do that really neatly for you (I also really didn't give into the agent hype and mostly used browsers LLM)
Like on one hand you feel more free to build projects but the whole joy of project completely got reduced.
I mean, I guess I am one of the junior dev's so to me AI writing code on topics I didn't know/prototyping felt awesome.
I mean I was still involved in say copy pasting or looking at the code it generates. Seeing the errors and sometimes trying things out myself. If AI is doing all that too, idk
For some reason, recently I have been disinterested in AI. I have used it quite a lot for prototyping but I feel like this complete out of the loop programming just very off to me with recent services.
I also feel like there is this sense of if I buy for some AI thing, to maximally extract "value" out of it.
I guess the issue could be that I can have vague terms or have a very small text file as input (like just do X alternative in Y lang) and I am now unable to understand the architectural decisions and the overwhelmed-ness out of it.
Probably gonna take either spec-driven development where I clearly define the architecture or development where I saw something primagen do recently which is that the AI will only manipulate code of that particular function, (I am imagining it for a file as well) and somehow I feel like its something that I could enjoy more because right now it feels like I don't know what I have built at times.
When I prototype with single file projects using say browser for funsies/any idea. I get some idea of what the code kind of uses with its dependencies and functions names from start/end even if I didn't look at the middle
A bit of ramble I guess but the thing which kind of is making me feel this is that I was talking to somebody and shwocasing them some service where AI + server is there and they asked for something in a prompt and I wrote it. Then I let it do its job but I was also thinking how I would architect it (it was some detect food and then find BMR, and I was thinking first to use any api but then I thought that meh it might be hard, why not use AI vision models, okay what's the best, gemini seems good/cheap)
and I went to the coding thing to see what it did and it actually went even beyond by using the free tier of gemini (which I guess didn't end up working could be some rate limit of my own key but honestly it would've been the thing I would've tried too)
So like, I used to pride myself on the architectural decisions I make even if AI could write code faster but now that is taken away as well.
I really don't want to read AI code so much so honestly at this point, I might as well write code myself and learn hands on but I have a problem with build fast in public like attitude that I have & just not finding it fun.
I feel like I should do a more active job in my projects & I am really just figuring out what's the perfect way to use AI in such contexts & when to use how much.
Thoughts?
Like there have been multiple times now where I wanted the code to look a certain way, but it kept pulling back to the way it wanted to do things. Like if I had stated certain design goals recently it would adhere to them, but after a few iterations it would forget again and go back to its original approach, or mix the two, or whatever. Eventually it was easier just to quit fighting it and let it do things the way it wanted.
What I've seen is that after the initial dopamine rush of being able to do things that would have taken much longer manually, a few iterations of this kind of interaction has slowly led to a disillusionment of the whole project, as AI keeps pushing it in a direction I didn't want.
I think this is especially true if you're trying to experiment with new approaches to things. LLMs are, by definition, biased by what was in their training data. You can shock them out of it momentarily, whish is awesome for a few rounds, but over time the gravitational pull of what's already in their latent space becomes inescapable. (I picture it as working like a giant Sierpinski triangle).
I want to say the end result is very akin to doom scrolling. Doom tabbing? It's like, yeah I could be more creative with just a tad more effort, but the AI is already running and the bar to seeing what the AI will do next is so low, so....
This would be fine if not for one thing: the meta-skill of learning to use the LLM depreciates too. Today's LLM is gonna go away someday, the way you have to use it will change. You will be on a forever treadmill, always learning the vagaries of using the new shiny model (and paying for the privilege!)
I'm not going to make myself dependent, let myself atrophy, run on a treadmill forever, for something I happen to rent and can't keep. If I wanted a cheap high that I didn't mind being dependent on, there's more fun ones out there.
Have to really look out for the crap.
Yea exactly, Like we are just waiting so that it gets completed and after it gets completed then what? We ask it to do new things again.
Just as how if we are doom scrolling, we watch something for a minute then scroll down and watch something new again.
The whole notion of progress feels completely fake with this. Somehow I guess I was in a bubble of time where I had always end up using AI in web browsers (just as when chatgpt 3 came) and my workflow didn't change because it was free but recently changed it when some new free services dropped.
"Doom-tabbing" or complete out of the loop AI agentic programming just feels really weird to me sucking the joy & I wouldn't even consider myself a guy particular interested in writing code as I had been using AI to write code for a long time.
I think the problem for me was that I always considered myself a computer tinker before coder. So when AI came for coding, my tinkering skills were given a boost (I could make projects of curiosity I couldn't earlier) but now with AI agents in this autonomous esque way, it has come for my tinkering & I do feel replaced or just feel like my ability of tinkering and my interests and my knowledge and my experience is just not taken up into account if AI agent will write the whole code in multi file structure, run commands and then deploy it straight to a website.
I mean my point is tinkering was an active hobby, now its becoming a passive hobby, doom-tinkering? I feel like I have caught up on the feeling a bit earlier with just vibe from my heart but is it just me who feels this or?
What could be a name for what I feel?
This makes it sound like we're back in the days of FrontPage/Dreamweaver WYSIWYG. Goodness.
So I think this tracks with Karpathy's defense of IDEs still being necessary ?
Has anyone found it practical to forgo IDEs almost entirely?
This stuff gets a whole lot more interesting when you let it start making changes and testing them by itself.
But what I like about this setup is that I have almost all the context I need to review the work in a single PR. And I can go back and revisit the PR if I ever run into issues down the line. Plus you can run sessions in parallel if needed, although I don't do that too much.
Also note that with Claude models, Copilot might allocate a different number of thinking tokens compared to Claude Code.
Things may have changed now compared to when I tried it out, these tools are in constant flux. In general I've found that harnesses created by the model providers (OpenAI/Codex CLI, Anthropic/Claude Code, Google/Gemini CLI) tend to be better than generalist harnesses (cheaper too, since you're not paying a middleman).
It's not about the model. It's about the harness
Starcraft and Factorio are exactly what it is not. Starcraft has a loooot of micro involved at any level beyond mid level play, despite all the "pro macros and beats gold league with mass queens" meme videos. I guess it could be like Factorio if you're playing it by plugging together blueprint books from other people but I don't think that's how most people play.
At that level of abstraction, it's more like grand strategy if you're to compare it to any video game? You're controlling high level pushes and then the units "do stuff" and then you react to the results.
In Starcraft, you decide if an SVC should mine crystals or build a bunker or repair a depot.
Imagine if instead it were a UI over agents that you could assign to different tasks, build new pipelines.
I don't think stutter-stepping marines to min/max combat was the metaphor he was going for.
I've been working in the mobile space since 2009, though primarily as a designer and then product manager. I work in kinda a hybrid engineering/PM job now, and have never been a particularly strong programmer. I definitely wouldn't have thought I could make something with that polish, let alone in 3 months.
That code base is ~98% Claude code.
Not sure if it's an American pronunciation thing, but I had to stare at that long and hard to see the problem and even after seeing it couldn't think of how you could possibly spell the correct word otherwise.
Trying to incorporate it in existing codebases (esp when the end user is a support interaction or more away) is still folly, except for closely reviewed and/or non-business-logic modifications.
That said, it is quite impressive to set up a simple architecture, or just list the filenames, and tell some agents to go crazy to implement what you want the application to do. But once it crosses a certain complexity, I find you need to prompt closer and closer to the weeds to see real results. I imagine a non-technical prompter cannot proceed past a certain prototype fidelity threshold, let alone make meaningful contributions to a mature codebase via LLM without a human engineer to guide and review.
It's been especially helpful in explaining and understanding arcane bits of legacy code behavior my users ask about. I trigger Claude to examine the code and figure out how the feature works, then tell it to update the documentation accordingly.
I never paid any attention to different models, because they all felt roughly equal to me. But Opus 4.5 is really and truly different. It's not a qualitative difference, it's more like it just finally hit that quantitative edge that allows me to lean much more heavily on it for routine work.
I highly suggest trying it out, alongside a well-built coding agent like the one offered by Claude Code, Cursor, or OpenCode. I'm using it on a fairly complex monorepo and my impressions are much the same as Karpathy's.
Which is to say you have to learn to use the tools. I've only just started, and cannot claim to be an expert. I'll keep using them - in part because everyone is demanding I do - but to use them you clearly need to know how to do it yourself.
I also find pointing it to an existing folder full of code that conforms to certain standards can work really well.
If you're using plain vanilla chatgpt, you're woefully, woefully out of touch. Heck, even plain claude code is now outdated
After you tried it, come back.
I tried a website which offered the Opus model in their agentic workflow & I felt something different too I guess.
Currently trying out Kimi code (using their recent kimi 2.5) for the first time buying any AI product because got it for like 1.49$ per month. It does feel a bit less powerful than claude code but I feel like monetarily its worth it.
Y'know you have to like bargain with an AI model to reduce its pricing which I just felt really curious about. The psychology behind it feels fascinating because I think even as a frugal person, I already felt invested enough in the model and that became my sunk cost fallacy
Shame for me personally because they use it as a hook to get people using their tool and then charge next month 19$ (I mean really Cheaper than claude code for the most part but still comparative to 1.49$)