Caveman: Why use many token when few token do trick

Posted by tosh 2 days ago

Caveman: Why use many token when few token do trick(github.com)

874 points | 359 commentspage 6

HarHarVeryFunny 2 days ago|

More like Pidgin English than caveman, perhaps, although caveman does make for a better name.

RomanPushkin 2 days ago||

Why the skill should have three absolutely similar SKILL.md files? Just curious

ArekDymalski 2 days ago||

While really useful now, I'm afraid that in the long run it might accelerate the language atrophy that is already happening. I still remember that people used to enter full questions in Google and write SMS with capital letters, commas and periods.

vova_hn2 2 days ago||

> I still remember that people used to enter full questions in Google

I think that, in the early days of internet search, entering full questions actually produced worse results than just a bunch of keywords or short phrases.

So it was a sign of a "noob", rather than a mark of sophistication and literacy.

jagged-chisel 2 days ago||

“Sophistication and literacy” are orthogonal to the peculiarities of a black box search engine.

Those literate sophisticates would still be noobs at getting something useful from Google.

dahart 1 day ago||

My kids made fun of me yesterday when they saw me using a question mark in a search query.

inzlab 18 hours ago||

Little here little there its tokrns at the end

arrty88 2 days ago||

Feels like there should be a way to compile skills and readme’s and even code files into concise maps and descriptions optimized for LLMs. They only recompile if timestamps are modified.

cadamsdotcom 2 days ago||

Caveman need invent chalk and chart make argument backed by more than good feel.

K0IN 1 day ago||

So you are telling me I prompted llms the right way all along

amelius 2 days ago||

By the way why don't these LLM interfaces come with a pause button?

amelius 2 days ago||

And a "prune here" button.

It often happens that the interesting information is in the first paragraph or so, and the remainder is all just the LLM not knowing when to stop. This is super annoying as a conversation then ends up being 90% noise.

postalcoder 2 days ago||

Pruning an assistant's response like that would break prompt caching.

Prompt caching is probably the single most important thing that people building harnesses think about and yet it's mind share in end users is virtually zero. If you had to think of all the weirdest, most seemingly baffling design decisions in an AI product, the answer to "why" is probably "to not break prompt caching".

zozbot234 1 day ago|||

Grug says prompt caching just store KV-cache which is sequenced by token. Easy cut it back to just before edit. Then regenerate after is just like prefill but tiny.

amelius 1 day ago|||

Maybe so, but pruning is still a useful feature.

If it hurts performance that much, maybe pruning could just hide the text leaving the cache intact?

stainablesteel 2 days ago||

i imagine they're doing superman level distributed compute across multiple clouds somewhere and cared more about delivering the final result of that than having the ability to pause. which is probably possible, but would require way more work than would be worthwhile. they probably thought the ability to stop and resubmit would be an adequate substitute.

amelius 2 days ago||

These models are autoregressive so I doubt they are running them across multiple clouds. And besides, a pause button is useful from a user's pov.

stainablesteel 2 days ago||

i'm not sure it is, what's so useful about it?

amelius 2 days ago||

Like I said in another comment:

yakattak 1 day ago||

I was wondering just yesterday if a model of “why waste time say lot word when few word do trick” would be easier on the tokens. I’ll have to give this a try lol

DonHopkins 2 days ago|

Deep digging cave man code reviews are Tha Shiznit:

https://www.youtube.com/watch?v=KYqovHffGE8

More comments...