The maths you need to start understanding LLMs

Posted by gpjt 9/2/2025

The maths you need to start understanding LLMs(www.gilesthomas.com)

612 points | 120 commentspage 2

kingkongjaffa 7 days ago|

The steps in this article are also the same process for doing RAG as well.

You computer an embedding vector for your documents or chunks of documents. And then you compute the vector for your users prompt, and then use the cosine distance to find the most semantically relevant documents to use. There are other tricks like reranking the documents once you find the top N documents relating to the query, but that’s basically it.

Here’s a good explanation

http://wordvec.colorado.edu/website_how_to.html

oulipo2 7 days ago||

Additions and multiplications. People are making it sound like it's complicated, but NNs have the most basic and simple maths behind

The only thing is that nobody understand why they work so well. There are a few function approximation theorems that apply, but nobody really knows how to make them behave as we would like

So basically AI research is 5% "maths", 20% data sourcing and engineering, 50% compute power, and 25% trial and error

amelius 7 days ago||

Gradient descent is like pounding on a black box until it gives you the answers you were looking for. Ihere is little more we know about it. We're basically doing Alchemy 2.0.

The hard technology that makes this all possible is in semiconductor fabrication. Outside of that, math has comparatively little to do with our recent successes.

p1dda 7 days ago||

> The only thing is that nobody understand why they work so well.

This is exactly what I have ascertained from several different experts in this field. Interesting that a machine has been constructed that performs better than expected and/or is performing more advanced tasks than the inventors expected.

skydhash 6 days ago||

The linear regression model "ax + b" is the most simplest one and is still quite useful. It can be interesting to discover some phenomenon that fits the model, but that's not something people have control over. But imagine spending years (expensively) training stuff with millions of weight to ultimately discover it was as simple as "e = mc^2" (and c^2 is basically a constant, so the equation is technically linear)

zahlman 7 days ago||

It appears that the "softmax" is found (as I hypothesized by looking at the results, before clicking the link) by exponentiating each value and normalizing to a sum of 1. It would be worthwhile to be explicit. The exponential function is also "high-school maths", and an explanation like that is much easier to follow than the Wikipedia article (since not a lot of rigour is required here).

tsunamifury 7 days ago||

I’m sure no one will read this but I was on the team that invented a lot of this early pre-LLM math at Google.

It was a really exciting time for me as I had pushed the team to begin looking at vectors beyond language (actions and other predictable perimeters we could extract from linguistic vectors.)

We had originally invented a lot of this because we were trying to make chat and email easier and faster, and ultimately I had morphed it into predicting UI decisions based on conversations vectors. Back then we could only do pretty simple predictions (continue vector strictly , reverse vector strictly or N vector options on an axis) but we shipped it and you saw it when we made hangouts, gmail and allo predict your next sentence. Our first incarnation was interesting enough that eric Schmidt recognized it and took my work to the board as part of his big investment in ML. From there the work in hangouts became all/gmail etc.

Bizarrely enough though under sundar, this became the Google assistant but we couldn’t get much further without attention layers so the entire project regressed back to fixed bot pathing.

I argued pretty hard with the executives that this was a tragedy but sundar would hear none of it, completely obsessed with Alexa and having a competitor there.

I found some sympathy with the now head of search who gave me some budget to invest in a messaging program that would advance prediction to get to full action prediction across the search surface and UI. We launched and made it a business messaging product but lost the support of executives during the LLM panic.

Sundar cut us and fired the whole team, ironically right when he needed it the most. But he never listened to anyone who worked on the tech and seemed to hold their thoughts in great disdain.

What happened after that is of course well known now as sundar ignored some of the most important tech in history due to this attitude.

I don’t think I’ll ever fully understand it.

throwaway-49203 6 days ago|

[dead]

lazarus01 6 days ago||

Here is the bible on deep learning, ”Deep learning with Python” written by Francois Chollet, the creator of Keras.

https://www.manning.com/books/deep-learning-with-python

d_sem 7 days ago||

I think the author did a sufficient job caveating his post without being verbose.

While reading through past posts I stumbled on a multi part "Writing an LLM from scratch" series that was an enjoyable read. I hope they keep up writing more fun content.

petesergeant 7 days ago||

You need virtually no maths to deeply and intuitively understand embeddings: https://sgnt.ai/p/embeddings-explainer/

apwell23 7 days ago||

> Actually coming up with ideas like GPT-based LLMs and doing serious AI research requires serious maths.

Does it ? I don't think so. All the math involved is pretty straightforward.

ants_everywhere 7 days ago||

It depends on how you define the math involved.

Locally it's all just linear algebra with an occasional nonlinear function. That is all straightforward. And by straightforward I mean you'd cover it in an undergrad engineering class -- you don't need to be a math major or anything.

Similarly CPUs are composed of simple logic operations that are each easy to understand. I'm willing to believe that designing a CPU requires more math than understanding the operations. Similarly I'd believe that designing an LLM could require more math. Although in practice I haven't seen any difficult math in LLM research papers yet. It's mostly trial and error and the above linear algebra.

apwell23 7 days ago||

yea i would love to see what complicated math all this came out of. I thought rigorous math was actually an impediment to AI progress. Did any math actually predict or prove that scaling data would create current AI ?

ants_everywhere 6 days ago||

I was thinking more about the everyday use of more advanced math to solve "boring" engineering challenges. Like finite math to layout chips or kernels. Or improvement to Strassen's algorithm for matrix multiplication. Or improving the transformer KV cache etc.

The math you would use to, for example, prove that search algorithm is optimal will generally be harder than the math needed to understand the search algorithm itself.

empiko 6 days ago||

It is straight forward because you have been probably exposed to a ton of AI/ML content in your life.

cultofmetatron 6 days ago||

just wanna plug https://mathacademy.com/courses/mathematics-for-machine-lear....

happy customer and have found it to be one of the best paid resources for learning mathematics in general. wish I had this when I was a student.

paradite 7 days ago|

I recently did a livestream on trying to understand attention mechanism (K, Q, V) in LLM.

I think it went pretty well (was able to understand most of the logic and maths), and I touched on some of these terms.