Posted by 0o_MrPatrick_o0 9 hours ago
In further reflection it is such a great indignity & such a collosal barrier to working with the machine that it insists on being a black box. The disingenuity of the American models (small print: except AI2 & some other labs; you all are so great) is a massive disadvantage to their use,... and a massive slap in the face.
It's a threat to human intelligence that it is not co-participative. Walking further into my own judgement and feelings: the insistence on being an opaque black box, the Seals Chinese Room, is such a vicious harm to society! This is civilizationally an unsafe form of AI that probably should be outlawed as anti-social. It's an impermissible asymmetry, a crippling dependent relationship to be forced into. I'm working myself up, but here: this.. imo, this is not just indignity, is harmful, it is evil.
This "6 month behind" trend we've seen for open models feels like at some point will be less important than simply the models unwillingness to speak for itself & to be observable.
I suspect that in some decades, as other architectures are found and used, that the inability of an LLM to "think" without also emitting a token will be seen as one of their fundamental limitations.
Humans somewhat do the same - something that's been demonstrated in split-brain experiments.
Because of the nature of how LLMs work — text prediction engines - by putting the explicit reasoning steps first, it improves the likelihood of the final answer (which then is being predicted based on the entire reasoning chain as input) being correct.
This evades an easy yes or no, so:
1. Many consumers believe reasoning-models allow that kind of question to be truthfully-answered, and their belief it reasonable given the marketing going on.
2. Implementers probably do not have the same belief when it comes to the terms mean or what capabilities they imply.
3. Yes, it doesn't actually do what the customer wanted it to do, which is a kind of retrospective introspection of internal thoughts and ideas.
____________
I advocate looking at everything from a document-generation perspective to cut down on traps and cognitive illusions. The "reasoning" models are a change in the style of document being iteratively-grown by the LLM, as opposed to something more anthropomorphized.
* Default: There's just the spoken dialogue between a Human Customer and Helpful Chatbot.
* "Reasoning": There's the spoken dialogue and a bunch of times the Helpful Chatbot character has an internal monologue. This provides more consistency between iterations, and can be mined by custom tools to call external code and insert results.
If your Human Customer character ask "Why did you say that", the LLM does not engage in a different process than "I have eaten an apple."
The LLM has no memories to consult or hidden goals to contemplate, it's the same process of finding more stuff that fits at the end of the document. Any benefits from a "reasoning model" is the LLM generates much better-looking additions because there's more (hidden) stuff for it to confabulate against.
1. https://medium.com/@eshvargb/the-llm-journey-how-neural-netw...
Tell me this. If you hired a junior engineer or designer who refused to explain their thinking on their code and how they solved for the spec what would you do?
(That being said the reasoning output is still a summary of the Kvcache)
Any explanation that someone gives of their thinking process is necessarily lossy and likely partially confabulated.
If it's useful, it's useful, enjoy. If you aren't comfortable with that, don't use LLMs. You aren't going to get a mathematical proof of your output, just learn to be comfortable with that, or opt out and be a goat farmer.
No, they aren't a summary. They are the actual decoding of the sequence of tokens emitted during the the “thinking” stage of response generation.
Just as with, say, a human onner monolog in words vs actual speech, they are a product of the same output process as the non-thinking tokens. They aren’t a translation of the internal process that precedes the output mapped into language, either as a full result or a summary.
Having access to the reasoning text and output would help with performance measurement.
For daily use I actually like the reasoning summary to be brief/quick to scan.
That said, I understand the author’s desire for the real thing. It just feels better to have that access, especially when Anthropic will give it to you, but encrypted.