A few random notes from Claude coding quite a bit last few weeks

Posted by bigwheels 1/26/2026

A few random notes from Claude coding quite a bit last few weeks(twitter.com)

https://xcancel.com/karpathy/status/2015883857489522876

911 points | 847 commentspage 4

doe88 1/28/2026|

Are there good guides about how to write Agents or good repos with examples? Also, are there big differences between how you would write one in Codex cli vs Claude code? Can there be run on it interchangeably?

noisy_boy 1/28/2026||

> I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media.

2026 is just when it picks up - it'll get exponentially worse.

I think 2026 is the year of Business Analysts who were unable to code. Now CC et all are good enough that they can realize the vision as long as one knows exactly the requirements (software design not that important). Programmers who didn't know business could get by so far. Not anymore, because with these tools, the guy who knows business can now code fairly well.

sponaugle 1/28/2026||

"I think 2026 is the year of Business Analysts who were unable to code." This is interesting - I have seen far more BAs losing jobs as a result of the 'work' they did being replaced by tools (both AI and AI-generated). I logically see the connection from AI tools giving BAs far more direct ability to produce something, but I don't see it actually happening. It is possible it is too early in the AI curve for the quality of a BA built product to be sufficient. CC and Opus45 are relatively new.

It could also be BAs being lazy and not jumping ahead of the train that is coming towards them. It feels like in this race the engineer who is willing to learn business will still have an advantage over the business person who learns tech. At least for a little while.

HugoDz 1/28/2026|||

Agree here, the code barrier (creating software) was hiding the real mountain: creating software business. The two are very different beasts.

kitd 1/28/2026||

with these tools, the guy who knows business can now code fairly well.

... until CC doesn't get it quite right and the guy who knows business doesn't know code.

rubzah 1/28/2026||

The future of the programmer profession: This AI-generated mess of a codebase does 80% of what I want. Now fix the last 20%, should be easy, right?

AnimalMuppet 1/28/2026||

Apart from the "AI-generated mess" part, that's too often been the past of the programmer profession, too.

jopsen 1/26/2026||

> - How much of society is bottlenecked by digital knowledge work?

Any qualified guesses?

I'm not convinced more traders on wall street will allocate capital more effectively leading to economic growth.

Will more programmers grow the economy? Or should we get real jobs ;)

iwontberude 1/27/2026|

Most of this countries challenges are strictly political. The pittance of work software can contribute is most likely negligible or destructive (e.g. software buttons in cars or palantir). In other words were picked all the low hanging fruit and all that left is to hang ourselves.

js8 1/27/2026|||

I actually disagree. Having software (AI) that can cut through the technological stuff faster will make people more aware of political problems.

iwontberude 1/27/2026|||

edit: country's* all that is left*

rschick 1/26/2026||

Great point about expansion vs speedup. I now have time to build custom tools, implement more features, try out different API designs, get 100% test coverage.. I can deliver more quickly, but can also deliver more overall.

twa927 1/27/2026||

I don't see the AI capacity jump in the recent months at all. For me it's more the opposite, CC works worse than a few months ago. Keeps forgetting the rules from CLAUDE.md, hallucinates function calls, generates tons of over-verbose plans, generates overengineered code. Where I find it a clear net-positive is pure frontend code (HTML + Tailwind), it's spaghetti but since it's just visualization, it's OK.

ValentineC 1/27/2026||

> Where I find it a clear net-positive is pure frontend code (HTML + Tailwind), it's spaghetti but since it's just visualization, it's OK.

This makes it sound like we're back in the days of FrontPage/Dreamweaver WYSIWYG. Goodness.

twa927 1/27/2026||

Hmm, your comment gave me the idea that maybe we should invent "What You Describe Is What You Get|. To replace HTML+Tailwind spaghetti with prompts generating it.

culi 1/28/2026|||

Sad to hear this attitude towards front-end code. Front-ends are so often already miswritten and full of accessibility pitfalls and I feel like LLMs are gonna dramatically magnify this problem :(

DominikPeters 1/28/2026||

Are you using Opus 4.5? Sounds more like Sonnet.

twa927 1/28/2026||

Yes I'm using Sonnet 4.5. Thanks for the tip, will try Opus 4.5, although costs might become an issue.

TuxSH 1/28/2026||

> although costs might become an issue.

If you have a ChatGPT subscription, try Codex with GPT-5.2-High or 5.2-codex High? In my experience, while being much slower, it produces far better results than Opus and seems even more aggressively subsidized (more generous rate limits).

jermberj 1/28/2026||

> The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic.

Does this not undercut everything going on here. Like, what?

awsanswers 1/28/2026|

It's predictable so you run defense around it with prompting, validation and model tuning. It generates volumes of working code in seconds from natural language prompts so it's extremely business efficient. We're talking about tools that generate correct code to 95% of a solution, the follow up human and automated test review, and second coding pass to fix the 5% are a non issue.

epolanski 1/27/2026||

> What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows a lot.

No doubt that good engineers will know when and how to leverage the tool, both for coding and improving processes (design-to-code, requirement collection, task tracking, basic code reviewal, etc) improving their own productivity and of those around them.

Motivated individuals will also leverage these tools to learn more and faster.

And yes, of course it's not the only tool one should use, of course there's still value in talking with proper human experts to learn from, etc, but 90% of the time you're looking for info the LLM will dig it from you reading at the source code of e.g. Postgres and its test rather than asking on chats/stack overflow.

This is a trasformative technology that will make great engineers even stronger, but it will weed out those who were merely valued for their very basic capability of churning something but never cared neither about engineering nor coding, which is 90% of our industry.

TheGRS 1/27/2026||

I do feel a big mood shift after late November. I switched to using Cursor and Gemini primarily and it was big change in my ability to get my ideas into code effectively. The Cursor interface for one got to a place that I really like and enjoy using, but its probably more that the results from the agents themselves are less frustrating. I can deal with the output more now.

I'm still a little iffy on the agent swarm idea. I think I will need to see it in action in an interface that works for me. To me it feels like we are anthropomorphizing agents too much, and that results in this idea that we can put agents into roles and them combine them into useful teams. I can't help seeing all agents as the same automatons and I have trouble understanding why giving an agent with different guideliens to follow, and then having them follow along another agent would give me better results than just fixing the context in the first place. Either that or just working more on the code pipeline to spot issues early on - all the stuff we already test for.

daxfohl 1/28/2026||

Now that it's real, is there a minimum bar of non-AI-generated code that should be required in any production product? Like if 100% of the code is AI generated (or even doom-tabbed) and something goes wrong in prod, (crash, record corruption, data leak, whatever) then what? 99%? 50%? What's the bar where the risk starts outweighing the reward? When do we look around and say "maybe we should start slowing down before we do something that destroys our company"?

Granted it's not a one-size-fits-all problem, but I'm curious if any teams have started setting up additional concrete safeguards or processes to mitigate that specific threat. It feels like a ticking time bomb.

It almost begs the question, what even is the reward? A degradation of your engineering team's engineering fundamentals, in return for...are we actually shipping faster?

cagenut 1/28/2026|

obviously you're not a devops eng, I think you're wildly under-estimating how much of business critical code pre-ai is completely orphaned anyway.

the people who wrote it were contractors long gone, or employees that have moved companies/departments/roles, or of projects that were long since wrapped up, or of people who got laid off, or the people who wrote it simply barely understood it in the first place and certainly don't remember what they were thinking back then now.

basically "what moron wrote this insane mess... oh me" is the default state of production code anyway. there's really no quality bar already.

daxfohl 1/28/2026||

I am a devops engineer and understand your point. But there's a huge difference: legacy code doesn't change. Yeah occasionally something weird will happen and you've got to dig into it, but it's pretty rare, and usually something like an expired certificate, not a logic bug.

What we're entering, if this comes to fruition, is a whole new era where massive amounts of code changes that engineers are vaguely familiar with are going to be deployed at a much faster pace than anything we've ever seen before. That's a whole different ballgame than the management of a few legacy services.

cagenut 1/28/2026||

after a decade of follow-the-sun deployments by php contractors from vietnam to costa rica where our only qa was keeping an eye on the 500s graph, ai can't scare me.

daxfohl 1/28/2026||

That's actually a good comparison. Though even then, I imagine you at least have the ability to get on the phone and ask what they just did. Whereas LLM would just be like, "IDK, that was my twin brother. I'd ask him directly, but unfortunately he has been garbage collected. It was very sad. Would you like a cookie?"

I wonder if there's any value in some system that preserves the chat context of a coding agent and tags the commits with a reference to it, until the feature has been sufficiently battle tested. That way you can bring them back from the dead and interrogate them for insight if something goes wrong. Probably no more useful than just having a fresh agent look at the diff in most cases, but I can certainly imagine scenarios where it's like "Oh, duh, I meant to do X but looks like I accidentally did Y instead! Here's a fix." way faster than figuring it out from scratch. Especially if that whole process can be automated and fast, worst case you just waste a few tokens.

I'm genuinely curious though if there's anything you learned from those experiences that could be applied to agent driven dev processes too.

cagenut 1/29/2026||

it was basically a mindless loop, very prime for being agent driven:

  - observe error rate uptick
  - maybe dig in with apm tooling
  - read actual error messages
  - compare what apm and logs said to last commit/deploy
  - if they look even tangentially related deploy the previous commit (aka revert)
  - if its still not fixed do a "debug push", basically stuff a bunch of print statements (or you can do better) around the problem to get more info

I won't say that solves every case but definitely 90% of them.

I think your point about preserving some amount of intent/context is good, but also like what are most of us doing with agents if not "loop on error message until it goes away".

lofaszvanitt 1/28/2026|

The whole thing is about getting rid of experts and let the entry level idiots do all the work. The coders become expendable. And people do not see the chasm staring back at them :D. LLMs in their current form redistributes "intelligence" and expertise to the average joes for mere pennies. It should be much much more expensive, or it will disrupt the whole ecosystem. If it becomes even more intelligent it must be bludgeoned to death a.k.a. regulated like hell, otherwise the ensuing disruption will kill the job market and in the long term human values.

As an added plus: those, who already have wealth will benefit the most, instead of the masses. Since the distribution and dissemination of new projects is at the same level as before, meaning you would need a lot of money. So no matter how clever you are with an llm, if you don't have the means to distribute it you will be left in the dirt.

More comments...