Posted by kachapopopow 20 hours ago
The VC economics are creating a reality distortion field where Anthropic is incentivized to burn more tokens so they can rent more GPUs so they can get more investment, and where I am incentivized to pipe the LLM inputs into `claude -p` and blast 50KB of useless proompt onto it so they don't ban me from their 95% discounted API endpoint.
read_toc tool:
...
{
"name": "mcp",
"qualified_name": "mcp",
"type": "constant",
"docstring": null,
"content_point": "src\\mcps\\code_help\\server.py::17::18::python::mcp",
"is_nested": false
},
{
"name": "handler",
"qualified_name": "handler",
"type": "constant",
"docstring": null,
"content_point": "src\\mcps\\code_help\\server.py::18::19::python::handler",
"is_nested": false
},
....update_content tool:
{
"content": "...",
"content_point": "src\\mcps\\code_help\\server.py::18::19::python::handler",
"project_root": ....
}I just wonder how unique these hashes will be if only 2 characters. It seems like the collision rate would be really high.
one mechanism we establish is that each model has a fidelity window, i.e., r tokens of content for s tag tokens; each tag token adds extra GUID-like marker capacity via its embedding vector; since 1,2,3 digit numbers only one token in top models, a single hash token lacks enough capacity & separation in latent space
we also show hash should be properly prefix-free, or unique symbols perp digit, e.g., if using A-K & L-Z to hash then A,R is legal hash whereas M,C is not permitted hash
we can do all this & more rather precisely as we show in our arXiv paper on same; next update goes deeper into group theory, info theory, etc. on boosting model recall, reasoning, tool calls, etc. by way of robust hashing
There many be many lines that are duplicates, eg “{“
> Why bother, you ask? Opus may be a great model, but Claude Code to this day leaks raw JSONL from sub-agent outputs, wasting hundreds of thousands of tokens. I get to say, “fuck it, subagents output structured data now”.
This is why I find the banning of using Claude subscriptions in other harnesses is so heinous. Their harness that they're forcing onto everyone has tons of big issues including wasting massive numbers of tokens. Very much in line with intentionally refusing to adhere to standards in the most IE6 way possible.
I feel I want to write my own and that maybe in the future a lot of developers will have custom harnesses and have highly customized versions as each user of these models wants to use these things in a way that's unique to their brain, much like how emacs is so great for the customization but one persons emacs config is often not what another wants or only wants a subset and then write their own features.
As an aside what is the feeling on all the various ai coding tools, does aider suck is that aider-ce/cecli are better or are the bespoke tools for each model like claudeCode and such better.