Posted by tosh 2 days ago
I think that, in the early days of internet search, entering full questions actually produced worse results than just a bunch of keywords or short phrases.
So it was a sign of a "noob", rather than a mark of sophistication and literacy.
Those literate sophisticates would still be noobs at getting something useful from Google.
It often happens that the interesting information is in the first paragraph or so, and the remainder is all just the LLM not knowing when to stop. This is super annoying as a conversation then ends up being 90% noise.
Prompt caching is probably the single most important thing that people building harnesses think about and yet it's mind share in end users is virtually zero. If you had to think of all the weirdest, most seemingly baffling design decisions in an AI product, the answer to "why" is probably "to not break prompt caching".
If it hurts performance that much, maybe pruning could just hide the text leaving the cache intact?
It often happens that the interesting information is in the first paragraph or so, and the remainder is all just the LLM not knowing when to stop. This is super annoying as a conversation then ends up being 90% noise.