Claude’s memory architecture is the opposite of ChatGPT’s

Posted by shloked 2 days ago

Claude’s memory architecture is the opposite of ChatGPT’s(www.shloked.com)

437 points | 230 commentspage 3

kiitos 2 days ago|

> Anthropic's more technical users inherently understand how LLMs work.

good (if superficial) post in general, but on this point specifically, emphatically: no, they do not -- no shade, nobody does, at least not in any meaningful sense

omnicognate 2 days ago||

Understanding how they work in the sense that permits people to invent and implement them, that provides the exact steps to compute every weight and output, is not "meaningful"?

There is a lot left to learn about the behaviour of LLMs, higher-level conceptual models to be formed to help us predict specific outcomes and design improved systems, but this meme that "nobody knows how LLMs work" is out of control.

recursive 1 day ago||

None of that is inherent, and vanishingly few of Anthropic's users invented LLMs.

omnicognate 1 day ago||

What is "inherent" supposed to mean here?

LLMs are understood to the extent that they can be built from the ground up. Literally every single aspect of their operation is understood so thoroughly that we can capture it in code.

If you achieved an understanding of how the human brain works at that level of detail, completeness and certainty, a Nobel prize wouldn't be anywhere near enough. They'd have to invent some sort of Giganobel prize and erect a giant golden statue of you in every neuroscience department in the world.

But if you feel happier treating LLMs as fairy magic, I've better things to do than argue.

recursive 1 day ago||

Inherent means implicit or automatic as far as I understand it. I have an inherent understanding of my own need for oxygen and food.

I don't have an inherent understanding of English, although I use it regularly.

Treating LLMs as fairy magic doesn't make me feel any happier, for whatever it's worth. But I'm not interested in arguing either.

I never intended to make any claims about how well the principles of LLMs can be understood. Just that none of that understanding is inherent. I don't know why they used that word, as it seems to weaken the post.

lukev 2 days ago|||

If we are going to create a binary of "understand LLMs" vs "do not understand LLMs", then one way to do it is as you describe; fully comprehending the latent space of the model so you know "why" it's giving a specific output.

This is likely (certainly?) impossible. So not a useful definition.

Meanwhile, I have observed a very clear binary among people I know who use LLMs; those who treat it like a magic AI oracle, vs those who understand the autoregressive model, the need for context engineering, the fact that outputs are somewhat random (hallucinations exist), setting the temperature correctly...

kiitos 2 days ago||

> If we are going to create a binary of "understand LLMs" vs "do not understand LLMs",

"we" are not, what i quoted and replied-to did! i'm not inventing strawmen to yell at, i'm responding to claims by others!

shloked 1 day ago|||

I should've been clearer, but what I meant was language models 101. Normal people don't understand even basics like LLMs are stateless by default and need to be given external information to "remember" things about you. Or, what is a system prompt.

kingkawn 2 days ago||

Thanks for this generalization, but of course there is a broad range of understanding how to improve usefulness and model tweaks across the meat populace.

richwater 2 days ago||

ChatGPT is quickly approaching (perhaps bypassing?) the same concerns that parents, teachers, psychologists had with traditional social media. It's only going to get worse, but trying to stop the technological process will never work. I'm not sure what the answer is. That they're clearly optimizing for people's attention is more worrisome.

visarga 2 days ago||

> That they're clearly optimizing for people's attention is more worrisome.

Running LLMs is expensive and we can swap models easily. The fight for attention is on, it acts like an evolutionary pressure on LLMs. We already had the sycophantic trend as a result of it.

WJW 2 days ago||

Seems like either a huge evolutionary advantage for the people who can exploit the (sometimes hallucinating sometimes not) knowledge machine, or else a huge advantage for the people who are predisposed to avoid the attention sucking knowledge machine. The ecosystem shifted, adapt or be outcompeted.

aleph_minus_one 2 days ago||

> Seems like either a huge evolutionary advantage for the people who can exploit the (sometimes hallucinating sometimes not) knowledge machine, or else a huge advantage for the people who are predisposed to avoid the attention sucking knowledge machine. The ecosystem shifted, adapt or be outcompeted.

Rather: use your time to learn serious, deep knowledge instead of wasting your time reading (and particularly: spreading) the science-fiction stories the AI bros tell all the time. These AI bros are insanely biased since they will likely loose a lot of money if these stories turn out to be false, or likely even if people stop believing in these science-fiction fairy tales.

auggierose 1 day ago||

Switched off memory (in Claude) immediately, not even tempted to try.

LeicaLatte 2 days ago||

Curious about the interaction between this memory behavior and fine-tuning. If the base model has these emergent memory patterns, how do they transfer or adapt when we fine-tune for specific domains?

Has anyone experimented with deliberately structuring prompts to take advantage of these memory patterns?

perryizgr8 1 day ago||

Why is the scroll so unnatural on this page?

amannm 1 day ago|

> Anthropic's more technical users inherently understand how LLMs work.

Yes, I too imagine these "more technical users" spamming rocketship and confetti emojis absolutely _celebrating_ the most toxic code contributions imaginable to some of the most important software out there in the world. Claude is the exact kind of engineer (by default) you don't want in your company. Whatever little reinforcement learning system/simulation they used to fine-tune their model is a mockery of what real software engineering is.