Backpropagation is a leaky abstraction (2016)

Posted by swatson741 3 days ago

Backpropagation is a leaky abstraction (2016)(karpathy.medium.com)

351 points | 159 commentspage 3

joshdavham 3 days ago|

Given that we're now in the year 2025 and AI has become ubiquitous, I'd be curious to estimate what percentage of developers now actually understand backprop.

It's a bit snarky of me, but whenever I see some web developer or product person with a strong opinion about AI and its future, I like to ask "but can you at least tell me how gradient descent works?"

I'd like to see a future where more developers have a basic understanding of ML even if they never go on to do much of it. I think we would all benefit from being a bit more ML-literate.

kojoru 3 days ago||

I'm wondering: how can understanding gradient descent help in building AI systems on top of LLMs? To mee it feels like the skills of building "AI" are almost orthogonal to skills of building on top of "AI"

joshdavham 3 days ago|||

I take your point in that they are mostly orthogonal in practice, but with that being said, I think understanding how these AI's were created is still helpful.

For example, I believe that if we were to ask the average developer about why LLM's behave randomly, they would not be able to answer. This to me exposes a fundamental hole in their knowledge of AI. Obviously one shouldn't feel bad about not knowing the answer, but I think we'd benefit from understanding the basic mathematical and statistical underpinnings on these things.

Al-Khwarizmi 3 days ago||

You can still understand that quite well without understanding backprop, though.

All you need is:

- Basic understanding of how a Markov chain can generate text (generating each word using corpus statistics on the previous few words).

- Understanding that you can then replace the Markov chain with a neural model which gives you more context length and more flexibility (words are now in a continuous space so you don't need to find literally the same words, you can exploit synonyms, similarity, etc., plus massive training data also helps).

- Finally, you add the instruction tuning (among all the plausible continuations the model could choose, teach it to prefer the ones human prefer - e.g. answering a question rather than continuing with a list of similar questions. You give the model cookies or slaps so it learns to prefer the answers humans prefer).

- But the core is still like in the Markov chain (generating each word using corpus statistics on the previous words).

I often give dissemination talks on LLMs to the general public and I have the feeling that with this mental model, you can basically know everything a lay user needs to know about how they work (you can explain things like hallucinations, stochastic nature, relevance of training data, relevance of instruction tuning, dispelling myths like "they always choose the most likely word", etc.) without any calculus at all; although of course this is subjective and maybe some people will think that explaining it in this way is heresy.

HarHarVeryFunny 3 days ago|||

Sure, but it'd be similar to being a software developer and not understanding roughly what a compiler does. In a world full of neural network based technology, it'd be a bit lame for a technologist not to at least have a rudimentary understanding of how it works.

Nowadays, fine tuning LLMs is becoming quite mainstream, so even if you are not training neural nets of any kind from scratch, if you don't understand how gradients are used in the training (& fine tuning) process, then that is going to limit your ability to fully work with the technology.

augment_me 3 days ago|||

Impossible requirement. The inherent quality of abstractions is to allow us to get more done without understanding everything. We dont write raw assembly for the same reason, you dont make fire by rubbing sticks, you dont go hunting for food in the woods, etc.

There is no need for the knowledge that you propose in a world where this is solved, you will achieve more goals by utilizing higher-level tools.

joshdavham 3 days ago||

I get your point and this certainly applies to most modern computing where each new layer of abstraction becomes so solid and reliable that devs can usually afford to just build on top of it without worrying about how it works. I don’t believe this applies to modern AI/ML however. Knowing the chain rule, gradient descent and basic statistics IMO is not the same level of solid as other abstractions in computing. We can’t afford to not know these things. (At least not yet!)

confirmmesenpai 3 days ago|||

so if you want to have a strong opinion on electric cars you need to be able to explain how an electric engine works right?

oceanplexian 3 days ago|||

I’d say so, the hallmark of being a car guy is understanding the basics, like the difference between a four cylinder and a six cylinder, a turbocharger from a supercharger, the different types of gearboxes (DCT vs AT or a CVT), and so on. They all affect the feel, capabilities, and limitations of the car.

Electric cars have similar complexities and limitations, for example the Bolt I owned could only go ~92MPH due to limitations in the gearing as a result of having a 1 speed gearbox. I would expect someone with a strong opinion of a car to know something as simple as the top speed.

esafak 2 days ago||||

The electric engine? I won't give someone the time of day if they don't understand the battery chemistry.

chermi 3 days ago|||

Depends on what exactly the opinion is, but generally I'd say yes. If it's about their looks, maybe not.. but even then understanding the basics that determine things like not needing air intake, exhaust, the placement of batteries, etc. can be helpful.

If it's about the supply chain, understanding at least the requirements for magnets is helpful.

On way to make sure you understand all of these things is to understand the electric motor. But you could learn the separate pieces of knowledge on the fly too.

The more you understand the fundamentals of what you're talking about, the more likely you are to have genuine insight because you can connect more aspects of the problem and understand more of the "why".

TL;DR it depends, but it almost always helps.

lock1 3 days ago|||

  > I'd like to see a future where more developers have a basic understanding of ML even if they never go on to do much of it. I think we would all benefit from being a bit more ML-literate.

Why "ML-literate" specifically? Also, there are some people against calculus and statistic in CS curriculum because it's not "useful" or "practical", why does ML get special treatment here?

Plus, I don't think a "gotcha" question like "what is gradient descent" will give you a good signal about someone if it get popularized. It probably will lead to the present-day OOP cargo cult, where everyone just memorizes whatever their lecturer/bootcamp/etc and repeats it to you without actually understanding what it does, why it's the preferred method over other strategies, etc.

joshdavham 3 days ago||

> Why "ML-literate" specifically?

We could also say AI-literate too, I suppose. I guess I just like to focus on ML generally because 1) most modern AI is possible only due to ML and 2) it’s more narrow and emphasizes the low level of how AI works.

esafak 2 days ago||

As ML becomes 'democratized' the fraction of people with technical understanding will trend towards zero. They will operate at a higher level of abstraction, and worry about novel things like hallucinations, prompt injection, and misinformation. Such is the nature of advancement.

littlestymaar 3 days ago||

I was happy to see Karpathy writing a new blog post instead of simply Twitter threads, but when I opened the link I just got dispointed to realize it's from 9 years ago…

I really hate what Twitter did to blogging…

Geee 3 days ago|

He has a new blog at https://karpathy.bearblog.dev/blog/

littlestymaar 3 days ago||

Oh, I wasn't aware of it, thank you very much!

phplovesong 3 days ago|

Sidenote why are people still using medium?

evbogue 3 days ago|

article is from 2016