Posted by chatmasta 7 hours ago
If people absolutely need to use AI to write replies, they NEED to start including a "everything after this was generated by AI" disclaimer
What I know for sure:
1.Stuff that has nothing to do with the current session got mixed in.
What guessing:
1.There's a minecraft.py file in the tool folder, and that might have triggered some hallucination.
2.Maybe data from some other project on the user's local machine got mixed in somehow.
3.Or it could be from another user's conversation.
Honestly, if I think about how the system actually works, I don't think it's pulling from another user's data. But other people say they've had issues like that, so I can't completely rule it out.
I saw this thing on YouTube once. When a bunch of users share the same system prompt, or prefix, the computation results get shared through something called a KV Cache. At least, that's what I understood. Not sure if I got it right. But if there's some bug in the hashmap that's supposed to keep those caches separate, then maybe multi-tenant memory management just broke down and that's what caused this. I mean, I can guess, but who knows. And honestly, even if that's exactly what happened, they'd never admit it.
At the end of the day, LLMs are just word predictors, right? They build up some kind of semantic space inside. So maybe the user's question just happened to be near Minecraft in that space. That's kind of what I think.
>"Maybe my coworker was talking about this in another session?"
This would be a critical bug that would slash the market value of a T$ company significantly, go ask your coworker or close the ticket, why do you expect the devs to put an enormous amount of effort hunting a potentially inexistent if you can't make that minuscule debugging effort.
We achieved significant savings simply by moving everything that varies across individuals out of the system prompt so every session starts from a cache point.
For example you never want your system prompt to start with the time that the session started. Move that to the first user message if needed.
The alternative explanation is that the inference engine, which batches several unrelated requests for parallel processing, messed up the unpacking and returned an unrelated user’s query. This one would be very scary as it will leak arbitrary content, but it seems much less likely here.