Posted by 0o_MrPatrick_o0 5 hours ago
It’s much harder to understand _why_ a model chose a particular approach in Claude Code. Especially because Claude will happily give you hallucinated reasons if you ask in retrospect.
Recent anecdote:
I was reviewing a colleague’s PR and Opus 4.8 decided to write the new feature in a completely new module. It was unnecessarily complex. We had a hard time understanding why it chose that, and it told us that it was so we could eventually deploy it as a separate micro-service and test it independently. What?
Only after being more a lot more specific about the implementation and spending a lot more tokens, it flat out refused to simplify the code with the actual reason. It turns out a line recently added to CLAUDE.md was making it incorrectly think that the module it was originally supposed to modify was legacy code that it was forbidden to extend.
This would have been caught immediately if we could inspect its thinking process.
fyi openai does the same; not really surprising or particularly evil
- "Read `description` and create a specification, implementation guide, and checklist." - "Ask clarifying questions. If any of those questions has a clear best recommendation, please select that yourself and record that in "autorecommendations.md". - "Have codex and antigravity review each of these and work to consensus."
These are the core of ~61 lines of prompting I do across 3 prompts, and I feel like the resulting artifacts describe some of the thinking. Also, some of the back-and-forth between the models feels like it gives some insight into the model "thinking".
I will say: I heavily used Fable when it was available; Opus + loops + codex and/or antigravity review is better than Fable at building things.
Mind sharing your prompts?
The LLM providers will clearly evolve to be more and more opaque as their services get more capable. The frontier models may even be provided as purely internal advisor or async only so they can monitor your CoT and final answers for cyber etc.
RL (the basis of LLM "thinking") is a pretty crude way to achieve the appearance of reasoning given that it reinforces all the steps, including missteps, that got it to a reward. Providing a summary could be seen as form of sane-washing, making the model look more purposeful and directed than it really is!
If that is the case thinking is not visible to us as users due to it not being done in text.
Idea somewhat similar to what you describe exist but they make steering/post-training/interpretation much harder.
EDIT:
They link to a Meta paper from 2024/2025 though: https://arxiv.org/pdf/2412.06769/.
I don't know about Claude, but latest GPT versions still have a readable reasoning stream. It sometimes leaks out when the model gets confused, e.g., during a tool call. If you're curious, looks simplified; less words; extremely compact. They optimize tokens. But remain readable.