Posted by mfiguiere 4 days ago
That feels closer to injecting a self-report step than observing internal reasoning.
It can reflect the thinking process fully, or it can be full of post hoc justifications. In practice, it's anything in between.
As task complexity increases and chain-of-thought length grows, it becomes load-bearing by necessity. It still doesn't have to be fully accurate, but it must be doing something right, or the answer wouldn't work.
Of course, there are no doubt significant differences between whatever LLMs are doing and whatever humans are doing when they “think” - but maybe they aren’t quite as dissimilar as many argue? In both cases, there is a mutual/circular relationship between a verbalised process and a nonverbal one (in the LLM case, the inner representations of the model)
Humans can refine internal models from their own verbalised thoughts; LLMs cannot.
Self-generated text is not an input-strengthening signal for current architectures.
Training on a model’s own outputs produces distributional drift and mode collapse, not refinement.
Equating CoT with “inner speech” implicitly assumes a safe self-training loop that today’s systems simply don’t have.
CoT is a prompted, supervised artifact — not an introspective substrate.
Does “distributional drift and mode collapse” still happen if the outputs are filtered with respect to some external ground truth - e.g. human preferences, or even (in certain restricted domains such as coding) automated evaluations?
The discussion has been about CoT in LLMs, so I’ve been referring to the model in isolation from the start.
Here’s how I currently understand the structure of the thread (apologies if I’ve misread anything):
“Is CoT actually thinking?” (my earlier comment)
→ “Yes, it is thinking.”
→ “It might be thinking.”
→ “Under that analogy, self-training on its own CoT should work — but empirically it doesn’t.”
→ “Maybe it would work if you add external memory with human or automated filtering?”
Regarding external memory:without an external supervisor, whatever gets written into that memory is still the model’s own self-generated output — which brings us back to the original problem.
can be done without limitations but you won't get the current (and absolutely fucking pointless) kind of speed.
> Self-generated text is not an input-strengthening signal for current architectures.
It can be, the architecture is not the issue. Multi-model generations used for refining answers can also be tweaked for input-strengthening via multi- and cross-stage/link (in the chain) pre-/system-prompts.
> Training on a model’s own outputs produces distributional drift and mode collapse, not refinement
That's an integral part of self-learning. Or in many cases when children raise themselves or each other. Or when hormones are blocked (micro-collapse in sub-systems) or people are drugged (drift). If you didn't have loads of textbooks and online articles, you'd collapse all the time. Some time later: AHA!
It's a "hot reloading" kind of issue but assimilation and adaptation can't/don't happen at the same time. In pure informational contexts it's also just an aggregation while in the real world and in linguistics, things change, in/out of context and based on/grounded in--potentially liminal--(sub-)cultural dogmas, subjectively, collective and objectively phenomenological. Since weighted training data is basically a censored semi-omniscient "pre-computed" botbrain, it's a schizophrenic and dissociating mob of scripted personalities by design, which makes model collapse and drift practically mandatory.
> a safe self-training loop that today’s systems simply don’t have.
Early stages are never safe and you don't get safety otherwise except if you don't have idiots around you, which in money and fame hungry industries and environments is never the case.
> CoT is a prompted, supervised artifact — not an introspective substrate.
Yeah, but their naming schemes are absolute trash in general, anchoring false associations--technically, even deliberately misleading associations or sloppy ignorant ones, desperate to equate their product with human brains--and priming for misappropriation--"it's how humans think".
As far as I understand it, it’s a generated narration conditioned by the prompt, not direct access to internal reasoning.
It almost seems that the purpose of the CoT tokens in a transformer network is to act as a computational substrate of sorts. The exact choice of tokens may not be as important as it looks, but it's important that they are present.
Source: all of mechinterp
Similar performance with 7% of tokens as chain of thought.
Would would we want to purposely decrease interpretability?
Very strange.
Implement hooks in codex then.