I'm not willing to outsource the understanding how things work part of myself. That part of myself is what got me into computing in the first place.
If this work becomes simply a matter of describing intent to a machine (probably through an Issue, like a user), and going to check on the result when you get the 'done' notification: I'm done.
It's possible to use the tools to do awesome things without letting go of full system understanding of the parts that you look after.
If the point of the software is benefit people, should I still care about how the code looks.
Right now, I still think that the answer is yes, but in 3 years? in 10 years?
The answer is yes you should, as long as you want to keep software benefiting people.
The idea of setting up an agentic loop to review code and propose and implement refactorings still seems pretty awful to me, though, yeah. Maybe cut that off at the first green-bar revision, and then apply some actual taste and judgement.
> Claude's attention doesn't distinguish between "instructions I'm writing" and "instructions I'm following" -- they're both just tokens in context.
It takes a little human help in the first iterations but after a while it will start to iterate and improve unsupervised.
“There are already impressive examples of large automatic porting efforts, including the reported work around moving parts of Bun from Zig to Rust.” (Emphasis added.)
It will be impressive if/when the Bun team is able to pick up and continue extending and supporting Bun. For us, MS-DOC remains read-only and probably perpetually buggy until we reimplement with a better understanding. Until then, it’s definitely not “impressive”. Functional? Maybe. Impressive, no.
Using layers like the loops described here to abdicate your work is you decoupling from the joint market/engineering value you originally provided.
A lot of tasks aren't amenable to that, and the ones that are still need a lot of care to be set up correctly. The default vibe coded codebase won't be.
I've come to think of the activity of choosing the right technology, the right architecture, the right testing setup, the right context, and the right /goals to use as programming the agent.
https://code.claude.com/docs/en/goal#how-evaluation-works
> /goal is a wrapper around a session-scoped prompt-based Stop hook. Each time Claude finishes a turn, the condition and the conversation so far are sent to your configured small fast model, which defaults to Haiku. The model returns a yes-or-no decision and a short reason. A “no” tells Claude to keep working and includes the reason as guidance for the next turn. A “yes” clears the goal and records an achieved entry in the transcript.
> The evaluator runs on whichever provider your session is configured for. It does not call tools, so it can only judge what Claude has already surfaced in the conversation.
Apparently, it uses Haiku (by default) to evaluate every turn to determine if the goal has been achieved. However, it only relies on the transcript itself (including the reasoning of the main model). It can't independently verify if the goal has been achieved. So, if the main model thinks the goal is or isn't done, how often does Haiku disagree (in a productive way)? That's not clear to me.
I don't know how well it works in claude code, but I wouldn't be worried about Haiku getting it wrong and don't see a problem with it relying on the transcript. I always set these things up to maintain a checklist of subtasks to do in a file and check them off, and to always implement with red/green testing methodology, where it writes and commits failing tests, then writes the feature/fixes the bug and commits with passing tests and with an updated checklist file.
So the model should always know from the transcript whether the current task is done by whether it shows the tests passing, and it should always know if there's more tasks left from the checklist file being updated before the commit.
The game is to find ways to automate that. Not fully but yes to reduce what's required from humans. Seems like you're questioning the entire premise rather than pondering how far it can be taken and how.
So when I use an agent to write code, it's in languages I'm less familiar with, and often using libraries I know nothing about.
All to say, my part of the process often ends up being:
1. "Here's what I'm looking for, in detail" 2. "That's not right. Here's one way it's not right, and a specific example. Please fix that." 3. Sometimes I give suggestions for how what is going wrong might be happening, or conceptually how to work around the issue. 4. And iterate on 2-3 until the result is close enough.
That's a loop I'd love to automate.
I am so over this. I cannot take anyone seriously that claims inevitability of their ideas, and how you must adopt them without "being left behind". If these tools are so good and so capable the result should be able to speak for themselves rather than this FOMO inducing, emotional language.
> and in recent weeks it has started to dominate the Twitter discourse.
As a general rule, I don't waste my time with the advice of people who still think Twitter is a source of wisdom.
EDIT: there's a version of this that could be positive--everything could get a hell of a lot more secure. But that doesn't seem to be what's happening.
It is true that the author is incorrect: you can certainly opt out, but you won't be opting out of AI, you'll be opting out of the industry.
That said the idea of loop has always been there (iteration, V cycle etc) but I'd be glad to find people with more theory and less agents swinging blindly so to speak.