The maths you need to start understanding LLMs

Posted by gpjt 9/2/2025

The maths you need to start understanding LLMs(www.gilesthomas.com)

612 points | 120 commentspage 3

11101010001100 7 days ago|

Apologies for the metacomment, but HN is a funny place. There is a certain type of learning that is deemed good ('math for AI') and a certain type of learning that is deemed bad ('leetcode for AI').

raincole 7 days ago||

What's leedcode for AI and which site is deemed bad by HN? Without a concrete example it's just a strawman. It could be the site is deemed bad for other reasons. It could be a few vocal negative comments. It could be just not happening.

boppo1 7 days ago|||

What would leetcode for AI be?

krackers 6 days ago||

I suppose the closest thing might be the type of counting/probability questions asked at quant firms as a way to assess math skill

sgt101 7 days ago|||

could you give an example of "HN would not like this AI leetcode"?

enjeyw 7 days ago|||

I mean I kind of get it - overgeneralising (and projecting my own feelings), but I think HN favours introducing and discussing foundational concepts over things that are closer to memorising/wrote-learning. I think AI Math vs Leetcode broadly fits into that category.

apwell23 7 days ago||

honestly i would love 'leetcode for AI' . I am just so sick of all the videos and articles about it.

jokoon 7 days ago||

ML is interesting, but honestly I have trouble knowing the future of it, to see if I should learn the techniques to land a job or not be too obsolete.

There is certainly some hype, a lot of what is the market is just not viable.

kekebo 7 days ago||

I keep having had the best time with Andrej Karpathy's Youtube intros into LLM math. But I haven't compared scope or quality to this submission

nativeit 7 days ago||

Ah, I was hoping this would teach me the maths to start understanding the economics surrounding LLMs. That’s the really impossible stuff.

orionuni 6 days ago||

Thanks for sharing!

fnord77 7 days ago||

nothing about vector calculus to minimize loss functions or needing to find Hessians to do Newton's method.

lazarus01 7 days ago||

Here are the building blocks for any deep learning system and a little bit about llm towards the end.

Graphs - It all starts with computational graphs. These are data structures that include element wise operations, usually matrix multiplication, addition, activation functions and loss function. The computations are differential, resulting in a smooth continuous space, appropriate for continuous optimization (gradient descent), which is covered later.

Layers - Layers are modules comprised of graphs that apply some computation and store the results in a state, referred to as the learned weights. Each Layer learns a deeper, more meaningful representation from the dataset, ultimately learning a latent manifold, which is a highly structured, lower dimensional space, that interpolates between samples, achieving generalization for predictions.

Different machine learning problems and data types use different layers, e.g. Transformers for sequence to sequence learning and convolutions for computer vision models, etc.

Models - Organize stacks of layers for training. Includes a loss function that sends a feedback signal to an optimizer to adjust learned weights during training. Models also include an evaluation metric for accuracy, independent of the loss function.

Forward pass - For training or inference, when an input sequence passes through all the network layers and a geometric transformation is applied producing an output.

Backpropagation - Durring training, after the forward pass, gradients are calculated for each weight with respect to the loss, gradients are just another word for derivatives. The process for calculating the derivatives is called automatic differentiation, which is based on the chain rule of derivation.

Once the derivatives are calculated the optimizers intelligently updates the weights, with respect to the loss. This is the process called “Learning” often referred to as gradient descent.

Now for Large Language Models.

Before models are trained for sequence to sequence learning, the corpus of knowledge must be transformed into embeddings.

Embeddings are dense representations of language that includes a multidimensional space that can capture meaning and context for different combinations of words that are part of sequences.

LLMs use a specific network layer called transformers, that includes something called an attention mechanism.

The attention mechanism uses the embeddings to dynamically update the meaning of words when they are brought together in a sequence.

The model uses three different representations of the input sequence, called the key, query and value matrices.

Using dot product, an attention score is created to identify the meaning of the reference sequence, then a target sequence is generated

The output sequence is predicted one word at a time, based on a sampling distribution of the target sequence, using a softmax function.

Mallowram 7 days ago||

[dead]

pangeranslot 7 days ago|

[dead]