it goes all over the place.
i'm not actually sure who your target audience is.
there's too many side tangents.
just like, structure it plz.
1. customer feels bad cuz they don't understand how llms work
2. provide high level abstracted explanation (don't dive into concepts yet)
3. provide breakdown guide of overall set of components.
4. walk through each component. don't side track. no need to explain, ROPE,GQA etc... it just distracts.
i.e. customers don't know how llms work, leading them to feel bad about their own intelligence.
at a high level llms take in words, do some math on them, and then produce words, one by one.
inside llms have these different components. we walk through them step by step.
1. tokenizer
2. embedding
3. attention
4. heads
5. ffn
6. sampling
## tokenizer
I imagine if resources were spent writing this text then one benefit of using it is not using more resources or the pollution caused from a chatbot.
> Researchers have found that some neurons inside the FFN are strongly associated with specific concepts or facts. One neuron might activate strongly on Eiffel-Tower-related text. Another on programming languages. Another on past-tense verbs.
People don't really write like this and they don't really talk like this (and no, people don't necessarily write exactly how they talk because they don't read exactly how they listen; the written word can be backtracked while the heard cannot, and speakers/writers know this, either consciously or unconsciously). A person would probably structure this more like:
> Researchers have found that some neurons inside the FFN are strongly associated with specific concepts or facts. For example, there could be one neuron that activates strongly on Eiffel-Tower-related text, another that activates strongly on programming languages, a third neuron activating on past-tense verbs, and so on.
Usually people wouldn't write "Another on programming languages." as a standalone sentence like that because the periods introduce an unnatural pause like they're giving a TED talk, unless of course they were punctuating that way for effect, but you'd essentially never communicate with that effect full time.
The one they're pointing out (the short punchy sentences) also apply to things like politicians and news articles. Blog posts are a bit odd.
* And here I mean those literal exact words. People are also extrapolating to similar patterns that use different or more words than "it's not" and "it's", but those flow better and aren't what I'm referring to here.
https://arxiv.org/abs/2604.21691
There's of course empirical results and relatively weak theoretical results like the UAT but I also don't think that answers your question fully, especially since it seems impossible to definitively answer questions that the industry seems to betting on like whether or not there is a lower bound to their error rate or whether hallucination as a problem can be solved. We have much stronger ideas of what linear regression is doing relative to what LLMs are doing.
https://www.youtube.com/watch?v=5MdSE-N0bxs is remarkably prescient given that it was written before LLMs