Posted by kachapopopow 11 hours ago
Also, nice clever optimization here. Lots of low hanging fruit in harness land.
Are they portable bit by bit back to pi, or is there enough differences that they can't? how about normal pi extensions, can they be used in omp?
Some of the stuff definitely looks interesting.
For them I think it would be optimal to provide a tag per function and trust the llm to rewrite the function. As the article notes full reproduction is generally more reliable than edited for short code.
The token and attention overhead from a per line hash I suspect limits this approach for smaller models
Back when I was maintaining a coding harness around the time of Claude 3.5 we tried hash prefixes we tried line number prefixes we tried a lot of different approaches to making the model better at selecting edit blocks and ultimately at-least then fuzzy string matching won out.
We got lines-with-anchors working fine as a replacement strategy, the problem was that when you don't make the model echo what it's replacing, it's literally dumber at writing the replacement; we lost more in test failures + retries than we gained in faster outputs.
Makes sense when you think about how powerful the "think before answering" principle is for LLMs, but it's still frustrating
Over a year ago had a lot of issues and the description and example was the difference between 30-50% failure to 1%!
So I'm surprised a bit about the point. May be I'm missing it.
Problem is, replace has been around for so long, most LLMs are tuned for it now
So, the challenge is actually to find a map of "problem" to "author" and then from "author" to "related code" and from their to a solution.
* Subscriptions are oversubscribed. They know how much an “average” Claude Code user actually consumes to perform common tasks and price accordingly. This is how almost all subscription products work.
* There is some speculation that there is cooperative optimization between the harness and backend (cache related etc).
* Subscriptions are subsidized to build market share; to some extent the harnesses are “loss leader” halo products which drive the sales of tokens, which are much more profitable.
I don’t believe it’s exceptionally unique or new that companies will revoke access if you are using an unpublished API that the apps use. I don’t see anything wrong with it myself. If you want, pay for normal token use on the published APIs. There is no expectation that you can use APIs for an application, even if you are a paid user, that are not published explicitly for usage.
It's truly disgusting.
So then it's better to start obeying ROBOTS.txt as a ladder pull through a "nicely behaved" image advantage.
The alternative is to say that bugs shouldn’t be fixed because it’s a ladder pull or something. But that’s crazy. What’s the point of complaining if not to get people to fix things?
It’s because they want to study you.
They want the data!
Underscores the importance of sovereign models you can run on the edge, finetune yourself, and run offline. At State of Utopia, we're working on it!
I keep asking myself “could my friends and family be handed this and be expected to build what I’m building on them” and the answer is an immediate “absolutely not”. Could a non technical manager use these tools do build what I’m building? Absolutely not. And when I think about it, it’s for the exact same reason it’s always been… they just aren’t a developer. They just don’t “think” in the way required to effectively control a computer.
LLMs are just another way to talk to a machine. They aren’t magic. All the same fundamental principles that apply to probably telling a machine what to do still apply. It’s just a wildly different mechanism.
That all being said, I think these things will dramatically speed up the pace that software eats the world. Put LLMs into a good harness and holy shit it’s like a superpower… but to get those superpowers unlocked you still have to know the basis, same as before. I think this applies to all other trades too. If you are a designer you still have to what good design is and how to articulate it. Data scientists still need to understand the basics of their trade… these tools just give them superpowers.
Whether or not this assertion remains true in two or three years remains to be seen but look at the most popular tool. Claude code is a command line tool! Their gui version is pretty terrible in comparison. Cursor is an ide fork of vscode.
These are highly technical tools requiring somebody that knows file systems, command lines, basic development like compilers, etc. they require you to know a lot of stuff most people simply don’t. The direction I think these tools will head is far closer to highly sophisticated dev tooling than general purpose “magic box” stuff that your parents can use to… I dunno… vibe code the next hit todo app.
It’s disheartening that programmers are using this advanced, cutting-edge technology with such a backwards, old-fashioned approach.[1]
Code generation isn’t a higher level abstraction. It’s the same level but with automation.
See [1]. I’m open to LLMs or humans+LLMs creating new abstractions. Real abstractions that hide implementation details and don’t “leak”. Why isn’t this happening?
Truly “vibe coding” might also get the same job done. In the sense of: you only have to look at the generated code for reasons like how a C++ programmer looks at the assembly. Not to check if it is even correct. But because there are concerns beyond just the correctness like code gen size. (Do you care about compiler output size? Sometimes. So sometimes you have to look.)
I will still opt for a scriptable shell. A few scripts, and I have a custom interface that can be easily composed. And could be run on a $100 used laptop from ebay.