Posted by kachapopopow 12 hours ago
Over a year ago had a lot of issues and the description and example was the difference between 30-50% failure to 1%!
So I'm surprised a bit about the point. May be I'm missing it.
If smaller labs (Zai, Moonshot, deepseek, mistral..) get together and embrace a harness, like opencode for example, as a consortium just by the power of "evolution across different environments" they might hit jackpot earlier than bigger labs.
Someone has to do the baseline training, development, and innovation. it can't be clones all the way down
I see a lot of evidence to the contrary though. Anyone know what the underlying issue here is?
Like a good programming language, a good harness offers a better affordance for getting stuff done.
Even if we put correctness aside, tooling that saves time and tokens is going to be very valuable.
It's completely understandable that prompting in better/more efficient means would produce different results.
I'd love to use a different harness-- ideally an OSS one-- and hook it up to whichever LLM provides the best bang for the buck rather than being tied to Claude.
Edit
Checking ohmypi The model has access to str replace too so this is just a edit till
If you run this out, you realize that the Worse is Better paradox has inverted, it's an arbitrage, and the race is on.
With search-replace you could work on separate part of a file independently with the LLM. Not to mention with each edit all lines below are shifted so you now need to provide LLM with the whole content.
Have you tested followup edits on the same files?
You probably don't want to use the line number though unless you need to disambiguate
But your write tool implementation can take care of that