Posted by Anon84 22 hours ago
[0]: https://www.goodreads.com/book/show/75622146-the-statquest-i...
[1]: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw
Is it reasonable to think that if one grinds to the book suggested here and background in web/dev SWE, one can break into ML/AI role?
If you can show off some skills I still wouldnt completely rule it out. Reading a single book cover to cover wont cut it though imo.
If anyone wants to understand fundamentals of machine learning, one of the superb resources I have found is, Stanford's "Probability for computer scientists"[1].
It goes into theoretical underpinnings of probability theory and ML, IMO better than any other course I have seen. But, this is a primarily a probability course that discusses the fundamentals of machine learning. (Yeah, Andrew Ng is legendary, but his course demands some mathematical familiarity with linear algebra topics)
There is a course reader for CS109 [2]. You can download pdf version of this. Caltech's learning from data was really good too, if someone is looking for theoretical understanding of ML topics [3].
There is also book for excellent caltech course[4].
Also, neural networks zero to hero is for understanding how neural networks are built from ground up [5].
[1] https://www.youtube.com/watch?v=2MuDZIAzBMY&list=PLoROMvodv4...
[2] https://chrispiech.github.io/probabilityForComputerScientist...
[3] https://work.caltech.edu/telecourse
[4] https://www.amazon.com/Learning-Data-Yaser-S-Abu-Mostafa/dp/...
[5] https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxb...
I've read chapters from both. Much overlaps, but sometimes one book or the other explains a concept better or provides different perspectives or details.
I don't think many people will want to read it today. As far as I know, mathematical theories like SLT have been of little use for the invention of transformers or for explaining why neural networks don't overfit despite large VC dimension.
Edit: I think the title "From theory to machine learning" sums up what was wrong with this theory-first approach. Basically, people with interest in math but with no interest in software engineering got interested in ML and invented various abstract "learning theories", e.g. statistical learning theory (SLT). Which had very little to do with what you can do in practice. Meanwhile, engineers ignored those theories and got their hands dirty on actual neural network implementations while trying to figure out how their performance can be improved, which led to things like CNNs and later transformers.
I remember Vapnik (the V in VC dimension) complaining in the preface to one of his books about the prevalent (alleged) extremism of focussing on practice only while ignoring all those beautiful math theories. As far as I know, it has now turned out that these theories just were far too weak to explain the actual complexity of approaches that do work in practice. It has clearly turned out that machine learning is a branch of engineering, not a branch of mathematics or theoretical computer science.
The title of this book encapsulates the mistaken hope that first people will learn those abstract learning theories, they get inspired, and promptly invent new algorithms. But that's not what happened. SLT is barely able to model supervised learning, let alone reinforcement learning or self-supervised learning. As I mentioned, they can't even explain why neural networks are robust to overfitting. Other learning theories (like computational/algorithmic learning theory, or fantasy stuff like Solomonoff induction / Kolmogorov complexity) are even more detached from reality.
(Before anyone laughs this off, this is still an actual problem in the real world for non-FAANG companies who have niche problems or cannot use open-but-non-commercial datasets. Not everything can be solved with foundational/frontier models.)
Please point me to these papers because I'm still learning.
> Please point me to these papers because I'm still learning.
Not sure which papers you have in mind. To be clear, I'm not an expert, just an interested layman. I just wanted to highlight the stark difference between the apparently failed pure math approach I learned years ago in a college class, and the actual ML papers that are released today, with major practical breakthroughs on a regular basis. Similarly practical papers were always available, just from very different people, e.g. LeCun or people at DeepMind, not from theoretical computer science department people who wrote text books like the one here. Back in the day it wasn't very clear (to me) that those practice guys were really onto something while the theory guys were a dead end.
The updates to even the Bias/Variance Dilemma (Geman 1992) are minor if you look at the original paper:
https://www.dam.brown.edu/people/documents/bias-variance.pdf
They were dealing with small datasets or infinite datasets, and double decent only really works when the patterns in your test set are similar enough to those in your training set.
While you do need to be mindful about some of the the older opinions, the fundamentals are the same.
For fine tuning or RL, the same problems with small datasets or infinite datasets, where concept classes for training data may be novel, that 1992 paper still applies and will bite you if you assume it is universally invalid.
Most of the foundational concepts are from the mid 20th century.
The availability of mass amounts of data and new discoveries have modified the assumptions and tooling way more than invalidating previous research. Skim that paper and you will see they simply dismissed the mass data and compute we have today as impractical at the time.
Find the book that works best for you, learn the concepts and build tacit experience.
Lots of efforts are trying to incorporate symbolic and other methods too.
IMHO Building breadth and depth is what will save time and help you find opportunities, knowledge of the fundamentals is critical for that.
Apart from these 3 you literally need nothing else for the very fundamentals and even advanced topics.
AIMA is Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig.
PRM is Pattern Recognition and Machine Learning by Christopher Bishop.
ESL is Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman.