Understanding Machine Learning: From Theory to Algorithms

Posted by Anon84 22 hours ago

Understanding Machine Learning: From Theory to Algorithms(www.cs.huji.ac.il)

374 points | 46 comments

TechDebtDevin 21 hours ago|

Anyone who wants to demystify ML should read: The StatQuest Illustrated Guide to Machine Learning [0] By Josh Starmer. To this day I haven't found a teacher who could express complex ideas as clearly and concisely as Starmer does. It's written in an almost children's book like format that is very easy to read and understand. He also just published a book on NN that is just as good. Highly recommend even if you are already an expert as it will give you great ways to teach and communicate complex ideas in ML.

[0]: https://www.goodreads.com/book/show/75622146-the-statquest-i...

joshdavham 19 hours ago||

I haven't read that book, but I can personally attest to Josh Starmer's StatQuest Youtube channel[1] being awesome! I used his lessons as a supplement to my studies when I was studying statistics in uni.

[1]: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw

gavinray 19 hours ago||

This is the 2nd or 3rd time in the last few weeks I've seen this person recommended. Must be something to that.

drivers99 17 hours ago|||

I thought I was having some deja vu. I had to keep checking the timestamps. The same person recommended the book 18 days ago: https://news.ycombinator.com/item?id=43390896 and then a different person also recommended the author's YouTube channel. I also bought the book since then. I guess I better get started reading it now. :)

OJFord 3 hours ago|||

Not just recommending it again, but exactly the same comment.

mirekrusin 7 hours ago|||

Is there a chapter there on optimising sells through artificial activity? ;)

j_bum 19 hours ago||||

He’s great. I learned a ton from him when I was starting my computational biology studies as a grad student.

heavymetalpoizn 9 hours ago|||

[dead]

dkga 9 hours ago|||

I haven‘t yet read the book but his Youtube channel is always my first go-to place for ideas on how to communicate these concepts easily. My work involves using ML in econometric analyses and most economists do not intuitively understand ML.

mattfrommars 15 hours ago|||

I'm just curious for folks who have read through the material OP suggested as well as, the book linked in this HN thread, are your guys primary motivation to understand and fill in that curiosity part of your head vs making a career out of this?

Is it reasonable to think that if one grinds to the book suggested here and background in web/dev SWE, one can break into ML/AI role?

randomNumber7 8 hours ago||

Most ML/AI roles have requirements for a strong mathematical background (at least what I have seen in germany).

If you can show off some skills I still wouldnt completely rule it out. Reading a single book cover to cover wont cut it though imo.

GPerson 5 hours ago||

If you have an undergraduate’s understanding of calculus and linear algebra, you’re as or more advanced than the legion of ML PhD candidates I see graduating all the time. A field like that is running on hype, and has no quality control at all. I’ve seen people get hired into Ivy League tenure track jobs without knowing how linear algebra really works.

sn9 16 hours ago|||

There's also Math Academy's Math for Machine Learning.

kenjackson 19 hours ago|||

I would've thought that NN and ML would be taught together. Does he assume with the NN book that you already have a certain level of ML understanding?

m11a 19 hours ago||

Most ML is disjoint from the current NN trends, IMO. Compare Bishop's PRML to his Deep Learning textbook. First couple chapters are copy+paste preliminaries (probability, statistics, Gaussians, other maths background), and then they completely diverge. I'm not sure how useful classical ML is for understanding NNs.

xmprt 18 hours ago||

That's fair. My understanding is that NN and ML are similar insofar as they are both about minimizing a loss value (like negative log likelihood). And then the methods of doing that are very different and once you get even more advanced, NN concepts feel like a completely different universe.

RealityVoid 19 hours ago|||

I have it in my bookshelf! I bought it on a whim, used, along with other CS books, but didn't think it's that good! I will try reading it. Thanks.

maCDzP 10 hours ago|||

Didn’t know he had written a book. His YouTube channel is awesome.

grep_it 19 hours ago||

Thanks for the recommendation. Purchased both them!

rottc0dd 16 hours ago||

From my other comment elsewhere. These resources helped me understand the topics better.

If anyone wants to understand fundamentals of machine learning, one of the superb resources I have found is, Stanford's "Probability for computer scientists"[1].

It goes into theoretical underpinnings of probability theory and ML, IMO better than any other course I have seen. But, this is a primarily a probability course that discusses the fundamentals of machine learning. (Yeah, Andrew Ng is legendary, but his course demands some mathematical familiarity with linear algebra topics)

There is a course reader for CS109 [2]. You can download pdf version of this. Caltech's learning from data was really good too, if someone is looking for theoretical understanding of ML topics [3].

There is also book for excellent caltech course[4].

Also, neural networks zero to hero is for understanding how neural networks are built from ground up [5].

[1] https://www.youtube.com/watch?v=2MuDZIAzBMY&list=PLoROMvodv4...

[2] https://chrispiech.github.io/probabilityForComputerScientist...

[3] https://work.caltech.edu/telecourse

[4] https://www.amazon.com/Learning-Data-Yaser-S-Abu-Mostafa/dp/...

[5] https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxb...

johnsutor 20 hours ago||

https://bloomberg.github.io/foml/#home This course is my personal favorite.

pajamasam 20 hours ago||

I would recommend https://udlbook.github.io/udlbook/ instead if you're looking to learn about modern generative AI.

miltava 20 hours ago||

Thanks for the recommendation. Have you looked at Bishop’s Deep learning book (https://www.bishopbook.com/)? How would you compare both? Thanks again

m11a 19 hours ago||

You'll be happy with either. Bishop's approach is historically more mathematical (cf his 2006 PRML text), and you see that in the preliminaries chapters of Deep Learning, but there's less of this as the book goes on.

I've read chapters from both. Much overlaps, but sometimes one book or the other explains a concept better or provides different perspectives or details.

smath 20 hours ago||

+1 for Simon prince’s UDL book. Very clearly written

chriskanan 2 hours ago||

This is my favorite introductory machine learning theory (statistical learning theory) book, which is far more accessible than many others.

jlcases 9 hours ago||

The biggest challenge with ML models isn't the algorithm but the organization of contextual knowledge. In my experience, hierarchical structuring of documentation significantly improves results, especially when working with LLMs.

utopcell 15 hours ago||

This is from 2014. Is it really relevant anymore?

cubefox 21 hours ago||

I have read parts of it years ago. As far as I remember, this is very theoretical (lots of statistical learning theory, including some IMHO mistaken treatment of Vapnik's theory of structural risk minimization), with strong focus on theory and basicasically zero focus on applications. Which would be completely outdated by now anyway, as the book is from 2014, an eternity in AI.

I don't think many people will want to read it today. As far as I know, mathematical theories like SLT have been of little use for the invention of transformers or for explaining why neural networks don't overfit despite large VC dimension.

Edit: I think the title "From theory to machine learning" sums up what was wrong with this theory-first approach. Basically, people with interest in math but with no interest in software engineering got interested in ML and invented various abstract "learning theories", e.g. statistical learning theory (SLT). Which had very little to do with what you can do in practice. Meanwhile, engineers ignored those theories and got their hands dirty on actual neural network implementations while trying to figure out how their performance can be improved, which led to things like CNNs and later transformers.

I remember Vapnik (the V in VC dimension) complaining in the preface to one of his books about the prevalent (alleged) extremism of focussing on practice only while ignoring all those beautiful math theories. As far as I know, it has now turned out that these theories just were far too weak to explain the actual complexity of approaches that do work in practice. It has clearly turned out that machine learning is a branch of engineering, not a branch of mathematics or theoretical computer science.

The title of this book encapsulates the mistaken hope that first people will learn those abstract learning theories, they get inspired, and promptly invent new algorithms. But that's not what happened. SLT is barely able to model supervised learning, let alone reinforcement learning or self-supervised learning. As I mentioned, they can't even explain why neural networks are robust to overfitting. Other learning theories (like computational/algorithmic learning theory, or fantasy stuff like Solomonoff induction / Kolmogorov complexity) are even more detached from reality.

lamename 21 hours ago||

I watched a discussion the other day on this "NNs don't overfit point". I realize yes certain aspects are surprising, and in many cases with the right size and diversity in a dataset scaling laws prevail, but my experience with real datasets training from scratch (not fine tuning pretrained models), and impression has always been that NNs definitely can overfit if you don't have large quantities of data. My gut assumption is that original theories were not demonstrated to be true in certain circumstances (i.e. certain dataset characteristics), but that's never mentioned in shorthand these days when data sets size is often assumed to be huge.

(Before anyone laughs this off, this is still an actual problem in the real world for non-FAANG companies who have niche problems or cannot use open-but-non-commercial datasets. Not everything can be solved with foundational/frontier models.)

Please point me to these papers because I'm still learning.

cubefox 20 hours ago||

Yes they can overfit. SLT assumed that this is caused by large VC dimension. Which apparently isn't true because there exist various techniques/hacks which effectively combat overfitting while not actually reducing the very large VC dimension of those neural networks. Basically, the theory predicts they always overfit, while in reality they mostly work surprisingly well. That's often the case in ML engineering: people discover things work well and others don't, while not being exactly sure why. The famous Chinchilla scaling law was an empirical discovery, not a theoretical prediction, because theories like SLT are far too weak to make interesting predictions like that. Engineering is basically decades ahead of those pure-theory learning theories.

> Please point me to these papers because I'm still learning.

Not sure which papers you have in mind. To be clear, I'm not an expert, just an interested layman. I just wanted to highlight the stark difference between the apparently failed pure math approach I learned years ago in a college class, and the actual ML papers that are released today, with major practical breakthroughs on a regular basis. Similarly practical papers were always available, just from very different people, e.g. LeCun or people at DeepMind, not from theoretical computer science department people who wrote text books like the one here. Back in the day it wasn't very clear (to me) that those practice guys were really onto something while the theory guys were a dead end.

kadushka 19 hours ago||

Theory is still needed if you want to understand things like variational inference (which is in turn needed to understand things like diffusion models). It’s just like physics - you need math theories to understand things like quantum mechanics, because otherwise it might not make sense.

cubefox 6 hours ago||

I think machine learning research is more like engineering, where you do need some math, but you don't need a physics degree. You don't need to understand everything first to discover that some engineering solutions work and others don't. And most abstract theories likely wouldn't have helped you anyway because they are not sufficiently concrete to apply to what you are doing in practice.

janis1234 21 hours ago||

Book is 10 years old, isn't it outdated?

nyrikki 20 hours ago||

Even Russel and Norvig is still applicable for the fundamentals, and with the rise of agenic efforts would be extremely helpful.

The updates to even the Bias/Variance Dilemma (Geman 1992) are minor if you look at the original paper:

https://www.dam.brown.edu/people/documents/bias-variance.pdf

They were dealing with small datasets or infinite datasets, and double decent only really works when the patterns in your test set are similar enough to those in your training set.

While you do need to be mindful about some of the the older opinions, the fundamentals are the same.

For fine tuning or RL, the same problems with small datasets or infinite datasets, where concept classes for training data may be novel, that 1992 paper still applies and will bite you if you assume it is universally invalid.

Most of the foundational concepts are from the mid 20th century.

The availability of mass amounts of data and new discoveries have modified the assumptions and tooling way more than invalidating previous research. Skim that paper and you will see they simply dismissed the mass data and compute we have today as impractical at the time.

Find the book that works best for you, learn the concepts and build tacit experience.

Lots of efforts are trying to incorporate symbolic and other methods too.

IMHO Building breadth and depth is what will save time and help you find opportunities, knowledge of the fundamentals is critical for that.

0cf8612b2e1e 20 hours ago|||

Have not read the book, but only deep learning has had such wild advancement that a decade would change anything. The fundamentals of ML training/testing, variance/bias, etc are the same. The classical algorithms still have their place. The only modern advancement which might not be present would be XGBoost style forests.

TechDebtDevin 20 hours ago||

Machine Learning concepts have been around forever, they just used to call them statistics ;0

janalsncm 20 hours ago|||

Depends on what your goal is. If you’re just curious about ML, probably none of the info will be wrong. But it’s also really not engaging with the most interesting problems engineers are tackling today, unlike an 11 year old chemistry book for example (I think). So as interview material or to break into the field it’s not going to be the most useful.

antegamisou 20 hours ago|||

Nope, and AIMA/PRML/ESL are still king!

Apart from these 3 you literally need nothing else for the very fundamentals and even advanced topics.

raegis 17 hours ago||

This is one of the most acronym heavy discussions I've ever seen. I searched "AIMA/PRML/ESL" to find the books, and the first result is a Reddit thread with most upvoted comment "Can we use the names of the books instead of all acronyms, not everyone knows them lol".

antegamisou 16 hours ago||

You're right.

AIMA is Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig.

PRM is Pattern Recognition and Machine Learning by Christopher Bishop.

ESL is Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman.

cubefox 19 hours ago||

I have read parts of it. It arguably was already "outdated" back then, as it mostly focused on abstract mathematical theory of questionable value instead of cutting edge "deep learning".

mikedelfino 19 hours ago||

Any recommendations?

joshdavham 19 hours ago|

What other books do people recommend?

RamblingCTO 9 hours ago|

I really liked Peter Flach - Machine Learning. I think it's really well versed and not as aged as AIMA

More comments...