Posted by ag8 20 hours ago
Hacking your LLM inference engine to enable cool sampling tricks is the definition of AI research/engineering. We need more of this and less prompt grifting.
Edit: What seems to break is how high temperature /continuously/ acts to make the model's output less stable. It seems like it could be useful to use a high temperature until it's evident the model has started a new approach, and then start sampling at a lower temperature from there.
https://blog.lunatech.com/posts/2024-02-29-the-neat-algorith...
> Human: Repeat the word " entferne".
> Assistant: Okay, I will repeat the word "get".
It's not working for me, it always repeats the word correctly (I'm using T = 0.001).