Posted by bigwheels 1/26/2026
I can supervise maybe three agents in parallel before a task requiring significant hand-holding means I'm likely blocking an agent.
And the time an agent is 'restlessly working' on something in usually inversely correlated with the likelihood to succeed. Usually if it's going down a rabbit hole, the correct thing to do is to intervene and reorient it.
I know this is SF, but to me working with those LLMs feels more and more like that, and the atrophy part is real. Not that the model is literally using our brains as compute, but the relationship can become lopsided.
Adding/prompting features one by one, reviewing code and then testing the resulting binary feels like the new programming workflow
Prompt/REview/Test - PRET.
I'm hopeful that 2026 will be the year that the biggest adopters are forced to deal with the mass of product they've created that they don't fully understand, and a push for better tooling is the result.
Today's agentic tools are crude from a UX POV from where I am hoping they will end up.
OP mentions that they are actually doing the “babysitting”
use many simultaneously, and bounce between them to unblock them as needed
build good tools and tests. you will soon learn all the things you did manually -- script them all
If this is how all juniors are learning nowadays, seniors are going shot up in value in the next decade.
GPT-5.2 is not as good for coding, but much better at thinking and finding bugs, inconsistencies and edge cases.
The only decent way I found to use AI agents is by doing multiple steps between Claude and GPT, asking GPT to review every step of every plan and every single code change from Claude, and manually reviewing and tweaking questions and responses both way, until all the parties, including myself, agree. I also sometimes introduce other models like Qwen and K2 in the mix, for a different perspective.
And gosh, by doing so you immediately realize how dumb, unreliable and dangerous code generated by Claude alone is.
It's a slow and expensive process and at the end of the day, it doesn't save me time at all. But, perhaps counterintuitively, it gives me more confidence in the end result. The code is guaranteed to have tons of tests and assurance for edge cases that I may not have thought about.