How LLMs work - Hacker News

Posted by 0xkato 2 days ago

How LLMs work(www.0xkato.xyz)

717 points | 198 commentspage 3

aabdi 13 hours ago|

this is hard to read...

it goes all over the place.

i'm not actually sure who your target audience is.

there's too many side tangents.

just like, structure it plz.

1. customer feels bad cuz they don't understand how llms work

2. provide high level abstracted explanation (don't dive into concepts yet)

3. provide breakdown guide of overall set of components.

4. walk through each component. don't side track. no need to explain, ROPE,GQA etc... it just distracts.

i.e. customers don't know how llms work, leading them to feel bad about their own intelligence.

at a high level llms take in words, do some math on them, and then produce words, one by one.

inside llms have these different components. we walk through them step by step.

1. tokenizer

2. embedding

3. attention

4. heads

5. ffn

6. sampling

## tokenizer

barrenko 12 hours ago|

It's just slop.

mathisdev7 6 hours ago||

very interesting and useful!

lhd1 15 hours ago||

find it difficult to engage with AI generated text. What am I getting here that I couldn't get from a chatbot.

blackoil 15 hours ago||

Hopefully someone has asked right questions and removed confusing answers/hallucinations.

dialsMavis 15 hours ago||

Is this text generated by AI? I couldn't tell but I'd believe it if it was.

I imagine if resources were spent writing this text then one benefit of using it is not using more resources or the pollution caused from a chatbot.

zemo 14 hours ago|||

normal people talk and write with some notion of meter, the cadence of communicating where pauses are inserted at places that naturally suit the speaker (and listener) to pause for thought. LLM's don't really do that, they just write a bunch of sentences.

> Researchers have found that some neurons inside the FFN are strongly associated with specific concepts or facts. One neuron might activate strongly on Eiffel-Tower-related text. Another on programming languages. Another on past-tense verbs.

People don't really write like this and they don't really talk like this (and no, people don't necessarily write exactly how they talk because they don't read exactly how they listen; the written word can be backtracked while the heard cannot, and speakers/writers know this, either consciously or unconsciously). A person would probably structure this more like:

> Researchers have found that some neurons inside the FFN are strongly associated with specific concepts or facts. For example, there could be one neuron that activates strongly on Eiffel-Tower-related text, another that activates strongly on programming languages, a third neuron activating on past-tense verbs, and so on.

Usually people wouldn't write "Another on programming languages." as a standalone sentence like that because the periods introduce an unnatural pause like they're giving a TED talk, unless of course they were punctuating that way for effect, but you'd essentially never communicate with that effect full time.

mattnewton 14 hours ago||

I don’t disagree with your conclusion that this is likely ai rewritten, but I do find it strange that you say “normal people don’t write like this” when it is mimicking how people write, and using patterns I have seen people write. I think models are at the point where style is not really reliable as an indicator anymore.

Izkata 55 minutes ago|||

A lot of the common patterns people ping as AI (like "it's not X, it's Y"*) are marketing-speak, of which there's a lot of on the internet. It's applying existing patterns in unusual locations, ignoring the original context.

The one they're pointing out (the short punchy sentences) also apply to things like politicians and news articles. Blog posts are a bit odd.

* And here I mean those literal exact words. People are also extrapolating to similar patterns that use different or more words than "it's not" and "it's", but those flow better and aren't what I'm referring to here.

AgentMatt 13 hours ago||||

I'm sure there's plenty of writing in the above style to be found on the Internet, and hence having been trained on by the LLM. I'm also not a fan of this style, and in particular I'd say it's rarely or never found in scientific / technical writing meant to convey understanding rather than sell or hype. So here it's IMO more of a style mismatch.

zemo 8 hours ago||||

It’s not a model of an author, it’s a model of documents. That’s not the same thing.

wizzwizz4 1 hour ago||

No, but sufficiently-advanced overfitting would lead to to the model keeping track of an author stylistic profile, in the same way it keeps track of the plot of a story it's writing (i.e., badly, but well enough that you have to pay attention to notice that something is wrong).

thin_carapace 13 hours ago||||

people sure do write like that, in novels. nobody writes scientific articles like novels, because scientific articles don't need to maximally capture audience attention. the purpose of a scientific article is to convey information - this pursuit is not assisted by punchy prose.

MagicMoonlight 8 hours ago|||

It is trained on its own slop. They haven’t trained these models on books for three years at this point. Only on generated slop. (And RL slop upvotes/downvotes from users)

rippeltippel 15 hours ago|||

The voice of several passages resembles ChatGPT very closely.

cubefox 12 hours ago||

We are living in a crazy science fiction world where on the top of the HN frontpage there is an article on how LLMs work which is likely itself LLM generated, and the only way to tell is its writing style rather than its factual accuracy.

singpolyma3 17 hours ago||

Next do "why LLMs work"

inkysigma 10 hours ago||

This is essentially an open research question. ML theory is unfortunately very weak relative to where the empirics are. I think there's a relatively optimistic paper that was posted a while back here but I would also take it with a grain of salt.

https://arxiv.org/abs/2604.21691

There's of course empirical results and relatively weak theoretical results like the UAT but I also don't think that answers your question fully, especially since it seems impossible to definitively answer questions that the industry seems to betting on like whether or not there is a lower bound to their error rate or whether hallucination as a problem can be solved. We have much stronger ideas of what linear regression is doing relative to what LLMs are doing.

sheeshkebab 16 hours ago|||

considering they work with any architecture/configuration given enough compute, just more or less efficiently - then maybe it's fundamental, in the same sense as why electricity works...

krackers 15 hours ago|||

See Tegmark's "why does deep cheap learning work so well" (well not so cheap anymore...)

https://www.youtube.com/watch?v=5MdSE-N0bxs is remarkably prescient given that it was written before LLMs

soupspaces 16 hours ago|||

Universal approximation theorem, embeddings, self-attention, gradient descent. And empirically, scaling laws.

qsera 9 hours ago|||

Because there are patterns everywhere!

skydhash 16 hours ago||

Why does linear regression works? Why does computer works? Because it's about math and the encoding information. If we can encode words as numbers, then why can't we encode their order as a relation? It's just that neural networks are very apt at finding that relation even if it's noisy.

lateral_cloud 12 hours ago||

I don't understand how these AI written articles get so many votes.

alansaber 6 hours ago|

There is a very high volume of them being posted every day, and they are a significant % of the total. Also, writing is hard, LLM articles can be slop whilst also being better written than average.

codeakki 14 hours ago||

What's the point of this? Im not here to engage with AI bots

whateveracct 14 hours ago||

accidentally quadratic

eddysir 5 hours ago||

[flagged]

transkey 13 hours ago|

[dead]

More comments...