That's one very short step removed from Simon Willison's lethal trifecta.
This is what's so annoying about it. It's like a child that does the same errors again and again.
But couldn't it adjust itself with the goal of reducing the error bit by bit? Wouldn't this lead to the ultimate agent who can read your mind? That would be awesome.
A mind reading ultimate agent sounds more like a deity, and there are more than enough fables warning one not to create gods because things tend to go bad. Pumping out ASI too quickly will cause massive destabilization and horrific war. Not sure who against really either. Could be us humans against the ASI, could be the rich humans with ASI against us. Anyway about it, it would represent a massive change in the world order.
Your improvement is someone else's code smell. There's no absolute right or wrong way to write code, and that's coming from someone who definitely thinks there's a right way. But it's my right way.
Anyway, I don't know why you'd expect it to write code the way you like after it's been trained on the whole of the Internet & the the RLHF labelers' preferences and the reward model.
Putting some words in AGENTS.md hardly seems like the most annoying thing.
tip: Add a /fix command that tells it to fix $1 and then update AGENTS.md with the text that'd stop it from making that mistake in the future. Use your nearest LLM to tweak that prompt. It's a good timesaver.
I also love using it for research for upcoming features. Research + pick a solution + implement. It happens so fast.
I wonder how much all this costs on a monthly basis?
You mentioned "harness engineering". How do you approach building "actual programmed tools" (like screenshot scripts) specifically for an LLM's consumption rather than a human's? Are there specific output formats or constraints you’ve found most effective?