Top
Best
New

Posted by trq_ 10/25/2024

Detecting when LLMs are uncertain(www.thariq.io)
283 points | 165 commentspage 3
6510 10/25/2024|
As someone with a website that is a historic archive of conspiratorial and proto-scientific unbelievables I'd say we need a believability rating for each author, org and website.

I'm getting a little tired of people thinking I believe everything I read and publish. If you claim to have invented a time machine, a teleportation device, a phone to call the dead or if you take pictures back in time of course someone should document every tiny technical detail you've shared with the world. (preferably without repeatedly stating the obvious)

The idea a reader would believe everything strikes me as rather hilarious. Even if just a robot. LLMs should aid those skilled in the art who desire to make the same with the materials but it would be silly if it uncritically reproduced the description of your warp drive, your parallel universe detector, mr fusion, sentient black goo, channelings and remote viewings, alien encounters, bigfoot sightings, shape shifting lizard experiences, quantum computer or memristors.

svachalek 10/25/2024|
As you have no doubt encountered with your archive, readers don't believe everything, they believe what they want to. In many cases that means rejecting the truth and believing the story. AI only knows what it's been told, it doesn't even have senses to compare to its own experience.
akomtu 10/25/2024||
LLMs simply answer the question: given this corpus of text you've read so far, what's the most probable next word? If half of the training dataset says the next word in similar conditions is A, and the other half says it's B, then LLMs will be "uncertain" whether it's A or B, but LLMs will be oblivious to the fact that both A and B are wrong, because most of the training dataset was LLM-generated slop.

The current stage of extracting the essense of reason from LLMs feels a lot like attempts to extract gold from iron in the medieval ages.

fsndz 10/25/2024||
nice. a similar idea was recently used to detect ragallucinations. the key is using logits when provided It was super insightful reading the clash eval paper https://www.lycee.ai/blog/rag-ragallucinations-and-how-to-fi...
trq_ 10/25/2024|
Yeah I wish more LLM APIs offered internal insights like logits, right now I think only OpenAI does and it started recently.
weitendorf 10/25/2024||
I think the authors are making a faulty assumption that single-token uncertainty requires intervention or is a sign that the model needs extra help, by conflating the immediately apparent and measurable choice of the next token with the not-immediately-apparent (because it requires generating multiple tokens in sequence, which can have a very high branching factor), not-easily-measured (because sentences with entirely different words can mean the same thing) decision to generate an answer with desired/correct semantics.

This is a subtle and understandable mistake, but I do suspect it's why they note at the top "A big caveat, there have been no large scale evals yet for Entropix, so it’s not clear how much this helps in practice. But it does seem to introduce some promising techniques and mental models for reasoning." I would like to see more evidence that High Entropy, Low Varentropy when deciding on a single token measurably corresponds with bad outcomes before accepting that there is any merit to this approach.

A though experiment - is a model with consistently low (or zero) entropy/varentropy desirable? First, it essentially means that the model makes no distinction in the semantics of different sequences of tokens in its answers, which due to the way models are trained also indicates that it probably makes no makes no distinction in the semantics of different sequences of tokens when processing input, which is bad, because that's not how language works. It also probably means that all the information encoded in the model's weights is "uncompressed" and doesn't generalize properly - the model may know that the sky was blue yesterday because it's in its training data, but how is it to know if it was blue today, or if it would be blue on a fictional planet with all the same physical characteristics as Earth? It's like saying you prefer your model to be overfit.

Another thought experiment - when you're starting a sentence, does it matter in the slightest whether you are highly predisposed to using "the" (low entropy+varentropy), split between about using "the" or "a" (low entropy, high varentropy), thinking about using many different definite/demonstrative words with no clear preference (high entropy, low varentropy), or thinking about using many different definite/demonstrative words with a clear preference to "the" (high entropy+varentropy)? It doesn't mean you're uncertain of the semantic meaning of the answer you're about to give. If you were to do as they suggest and take it as an indicator to think more deeply before responding, you'd not only waste time in your response (this is literally the same thing as when people say "um" and "uh" a lot when talking, which is considered bad) but distract yourself from the choice of answering with the right semantics with the choice of starting with the right word, which doesn't actually matter.

wantsanagent 10/25/2024||
Please please keep your Y axis range consistent.
ttpphd 10/25/2024||
LLMs do not model "certainty". This is illogical. It models the language corpus you feed the model.
tylerneylon 10/25/2024||
Essentially all modern machine learning techniques have internal mechanisms that are very closely aligned with certainty. For example, the output of a binary classifier is typically a floating point number in the range [0, 1], with 0 being one class, and 1 representing the other class. In this case, a value of 0.5 would essentially mean "I don't know," and answers in between give both an answer (round to the nearest int) as well as a sense of certainty (how close was the output to the int). LLMs offer an analogous set of statistics.

Speaking more abstractly or philosophically, why could a model never internalize something read between the lines? Humans do, and we're part of the same physical system — we're already our own kinds of computers that take away more from a text than what is explicitly there. It's possible.

astrange 10/25/2024|||
You don't have to teach an transformer model using a language corpus even if that was the pretraining. You can e.g. write algorithms directly and merge them into the model.

https://github.com/yashbonde/rasp

https://github.com/arcee-ai/mergekit

menhguin 10/25/2024||
Recent research using SAEs suggest that some neurons regulate confidence/certainty: https://arxiv.org/abs/2406.16254
tech_ken 10/25/2024||
"Thinking token" is an interesting concept, is there more literature on that?
mountainriver 10/25/2024|
https://arxiv.org/abs/2310.02226
chx 10/25/2024||
Detecting when LLMs are Uncertain?

return true;

There, I didn't need a paper to answer the question.

_jonas 10/30/2024|
[dead]