Top
Best
New

Posted by 0o_MrPatrick_o0 9 hours ago

The text in Claude Code’s “Extended Thinking” output(patrickmccanna.net)
240 points | 176 commentspage 4
jauntywundrkind 6 hours ago|
There was a little spontaneous outbreak of joy in the GLM vs Opus thread about GLM's willingness/ability to say what it's seeing. https://news.ycombinator.com/item?id=48628464

In further reflection it is such a great indignity & such a collosal barrier to working with the machine that it insists on being a black box. The disingenuity of the American models (small print: except AI2 & some other labs; you all are so great) is a massive disadvantage to their use,... and a massive slap in the face.

It's a threat to human intelligence that it is not co-participative. Walking further into my own judgement and feelings: the insistence on being an opaque black box, the Seals Chinese Room, is such a vicious harm to society! This is civilizationally an unsafe form of AI that probably should be outlawed as anti-social. It's an impermissible asymmetry, a crippling dependent relationship to be forced into. I'm working myself up, but here: this.. imo, this is not just indignity, is harmful, it is evil.

This "6 month behind" trend we've seen for open models feels like at some point will be less important than simply the models unwillingness to speak for itself & to be observable.

_fzslm 6 hours ago||
Cat and mouse measures like this rarely work forever.
simianwords 8 hours ago||
Wait I think there are 2 levels of summary. Anthropic is definitely not showing its real thinking even with enterprise agreements. For example in Claude.ai the thinking traces are not real and are themselves summaries.
isodev 7 hours ago||
I hope it doesn't come as a surprise to anyone - LLMs don't really "think".
nlarew 7 hours ago|
Your basic analysis is not the point of the article
jerf 8 hours ago||
AIUI it's fairly well established that the models can be saying one thing and "really" thinking another anyhow. The ones I recall seeing traced how simple one-digit arithmetic was done in the chat versus the actual activations under the hood. Tracing a real, non-trivial task through that way would be challenging, and I'd expect it is unlikely that the reasoning would say one thing while some utterly unrelated actual thought process is happening below, but I would expect that there might be a lot of places where the text of the reasoning diverges from what is "actually" being done. I'm not sure the full reasoning readout would produce much real insight anyhow.

I suspect that in some decades, as other architectures are found and used, that the inability of an LLM to "think" without also emitting a token will be seen as one of their fundamental limitations.

micromacrofoot 7 hours ago||
well yeah I wouldn't want anyone to read my unsummarized thinking either
philipwhiuk 8 hours ago||
To be honest I thought the 'thinking' was the model being asked 'how did you come up with that' and then it generating a plausible explanation. I know at one point this was correct.

Humans somewhat do the same - something that's been demonstrated in split-brain experiments.

stingraycharles 8 hours ago||
No not at all, you got it backwards. This was originally called “chain of thought prompting”, and it basically explained a model on how to reason through a problem before providing an answer.

Because of the nature of how LLMs work — text prediction engines - by putting the explicit reasoning steps first, it improves the likelihood of the final answer (which then is being predicted based on the entire reasoning chain as input) being correct.

Terr_ 7 hours ago|||
> To be honest I thought the 'thinking' was the model being asked 'how did you come up with that' and then it generating a plausible explanation.

This evades an easy yes or no, so:

1. Many consumers believe reasoning-models allow that kind of question to be truthfully-answered, and their belief it reasonable given the marketing going on.

2. Implementers probably do not have the same belief when it comes to the terms mean or what capabilities they imply.

3. Yes, it doesn't actually do what the customer wanted it to do, which is a kind of retrospective introspection of internal thoughts and ideas.

____________

I advocate looking at everything from a document-generation perspective to cut down on traps and cognitive illusions. The "reasoning" models are a change in the style of document being iteratively-grown by the LLM, as opposed to something more anthropomorphized.

* Default: There's just the spoken dialogue between a Human Customer and Helpful Chatbot.

* "Reasoning": There's the spoken dialogue and a bunch of times the Helpful Chatbot character has an internal monologue. This provides more consistency between iterations, and can be mined by custom tools to call external code and insert results.

If your Human Customer character ask "Why did you say that", the LLM does not engage in a different process than "I have eaten an apple."

The LLM has no memories to consult or hidden goals to contemplate, it's the same process of finding more stuff that fits at the end of the document. Any benefits from a "reasoning model" is the LLM generates much better-looking additions because there's more (hidden) stuff for it to confabulate against.

InsideOutSanta 8 hours ago|||
If you ask an LLM afterward how it arrived at an answer, it might produce a plausible but incorrect explanation. But that's not what the thinking stream is; that's actually part of how it generates the answer.
devmor 8 hours ago||
That's not really how LLMs work at all. I would really recommend checking out something like [1] to get a rough understanding and avoid attributing too much to them.

1. https://medium.com/@eshvargb/the-llm-journey-how-neural-netw...

tsunamifury 8 hours ago||
It’s not surprising than the Sota model makers core goal is to get user dependent while denying them increasing amounts of understanding of how it works to form a deeply unhealthy dependency.

Tell me this. If you hired a junior engineer or designer who refused to explain their thinking on their code and how they solved for the spec what would you do?

(That being said the reasoning output is still a summary of the Kvcache)

orangecat 7 hours ago|
* If you hired a junior engineer or designer who refused to explain their thinking on their code*

Any explanation that someone gives of their thinking process is necessarily lossy and likely partially confabulated.

tsunamifury 6 hours ago||
Did you not even bother to read to even the end of the comment before jumping at 'correcting' someone?
bpodgursky 8 hours ago||
The full thinking logs are also a summary of a thinking process presumably consistent with one necessary to generate the provided answer. Nobody really understands how LLMs think. Thinking logs seem to be accurate, and summary thinking logs seem to be a good summary of the full thinking logs.

If it's useful, it's useful, enjoy. If you aren't comfortable with that, don't use LLMs. You aren't going to get a mathematical proof of your output, just learn to be comfortable with that, or opt out and be a goat farmer.

dragonwriter 8 hours ago||
> The full thinking logs are also a summary of a thinking process presumably consistent with one necessary to generate the provided answer.

No, they aren't a summary. They are the actual decoding of the sequence of tokens emitted during the the “thinking” stage of response generation.

Just as with, say, a human onner monolog in words vs actual speech, they are a product of the same output process as the non-thinking tokens. They aren’t a translation of the internal process that precedes the output mapped into language, either as a full result or a summary.

0o_MrPatrick_o0 8 hours ago|||
I want to measure performance drift over time.

Having access to the reasoning text and output would help with performance measurement.

solarkraft 8 hours ago||
Yeah. The output is magic either way, with or without reasoning.

For daily use I actually like the reasoning summary to be brief/quick to scan.

That said, I understand the author’s desire for the real thing. It just feels better to have that access, especially when Anthropic will give it to you, but encrypted.

More comments...