Top
Best
New

Posted by bigwheels 1/26/2026

A few random notes from Claude coding quite a bit last few weeks(twitter.com)
https://xcancel.com/karpathy/status/2015883857489522876
911 points | 847 commentspage 6
siliconc0w 1/27/2026|
Not sure how he is measuring, I'm still closer to about a 60% success rate. It's more like 20% is an acceptable one-shot, this goes to 60% acceptable with some iteration, but 40% either needs manual intervention to succeed or such significant iteration that manual is likely faster.

I can supervise maybe three agents in parallel before a task requiring significant hand-holding means I'm likely blocking an agent.

And the time an agent is 'restlessly working' on something in usually inversely correlated with the likelihood to succeed. Usually if it's going down a rabbit hole, the correct thing to do is to intervene and reorient it.

jliptzin 1/28/2026||
The tenacity part is definitely true. I told it to keep trying when it kept getting stuck trying to spin up an Amazon Fargate service. I could feel its pain, and wanted to help, but I wanted to see whether the LLM could free itself from the thorny and treacherous AWS documentation forest. After a few dozen attempts and probably 50 KWh of energy it finally got it working, I was impressed. I could have done it faster myself, but the tradeoff would have been much higher blood pressure. Instead I relaxed and watched youtube while the LLM did its work.
daxfohl 1/27/2026||
I'm curious to see what effect this change has on leadership. For the last two years it's been "put everything you can into AI coding, or else!" with quotas and firings and whatever else. Now that AI is at the stage where it can actually output whole features with minimal handholding, is there going to be a Frankenstein moment where leadership realizes they now have a product whose codebase is running away from their engineering team's ability to support it? Does it change the calculus of what it means to be underinvested vs overinvested in AI, and what are the implications?
poszlem 1/28/2026||
I keep thinking about the TechnoCore from Dan Simmons' Hyperion, where the AIs were serving humans but secretly that was a parasitic relation, where they've been secretly using human brains as distributed processing nodes, essentially harvesting humanity's neural activity for their own computational needs without anyone's knowledge.

I know this is SF, but to me working with those LLMs feels more and more like that, and the atrophy part is real. Not that the model is literally using our brains as compute, but the relationship can become lopsided.

longhaul 1/28/2026||
Am working on an iPhone app and impressed with how well Claude is able to generate decent/working code with prompts in plain English. I don’t have previous experience in building apps or swift but have a C++ background. Working in smaller chunks and incrementally adding features rather than a large prompt for the whole app seems more practical, is easier to review and build confidence.

Adding/prompting features one by one, reviewing code and then testing the resulting binary feels like the new programming workflow

Prompt/REview/Test - PRET.

axus 1/28/2026||
Finally, literate programming!

https://en.wikipedia.org/wiki/Literate_programming

cmrdporcupine 1/28/2026||
Right on especially on two things -- 1) the tools doing a disservice by not interviewing and seeking input and 2) The 2026 "Slopocalypse"

I'm hopeful that 2026 will be the year that the biggest adopters are forced to deal with the mass of product they've created that they don't fully understand, and a push for better tooling is the result.

Today's agentic tools are crude from a UX POV from where I am hoping they will end up.

all2well 1/27/2026||
What particular setups are getting folks these sorts of results? If there’s a way I could avoid all the babysitting I have to do with AI tools that would be welcome
geraneum 1/27/2026||
> If there’s a way I could avoid all the babysitting I have to do with AI tools that would be welcome

OP mentions that they are actually doing the “babysitting”

spongebobstoes 1/27/2026||
i use codex cli. work on giving it useful skills. work on the other instruction files. take Karpathy tips around testing and declarativeness

use many simultaneously, and bounce between them to unblock them as needed

build good tools and tests. you will soon learn all the things you did manually -- script them all

Jean-Papoulos 1/29/2026||
>Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it.

If this is how all juniors are learning nowadays, seniors are going shot up in value in the next decade.

jedisct1 1/28/2026|
Claude is good at writing code, not so good at reasoning, and I would never trust or deploy to production something solely written by Claude.

GPT-5.2 is not as good for coding, but much better at thinking and finding bugs, inconsistencies and edge cases.

The only decent way I found to use AI agents is by doing multiple steps between Claude and GPT, asking GPT to review every step of every plan and every single code change from Claude, and manually reviewing and tweaking questions and responses both way, until all the parties, including myself, agree. I also sometimes introduce other models like Qwen and K2 in the mix, for a different perspective.

And gosh, by doing so you immediately realize how dumb, unreliable and dangerous code generated by Claude alone is.

It's a slow and expensive process and at the end of the day, it doesn't save me time at all. But, perhaps counterintuitively, it gives me more confidence in the end result. The code is guaranteed to have tons of tests and assurance for edge cases that I may not have thought about.

More comments...