Caching is an abstraction, not an optimization

Posted by samuel246 5 days ago

Caching is an abstraction, not an optimization(buttondown.com)

135 points | 126 commentspage 2

suspended_state 2 days ago|

Let's first get the obvious out of the way: caching is not an abstraction, the "Storage" abstraction is what enables caching to be implemented. If I had to put Caching in a category, I would say that it's an optimization strategy.

But that's not really what the blogpost is about. The issue that it tries to discuss is the fact that this abstraction is often imposed to us, without any way to control its behaviour. That's the examples of the LOAD_NAME in python he points at. Without having a clear understanding of the access patterns the application mostly uses, a caching strategy cannot be well defined, and you'll end up with an inadequate solution.

kazinator 2 days ago||

Optimization isn't separable from abstraction. Abstraction is something that can be implemented in more than one way, while meeting the terms of its contract. That flexibility allows for optimization.

gmuslera 2 days ago||

"fast storage" is about performance, your abstraction includes performance elements. If you go that down, then you are optimizing on your abstraction designs. What doesn't have to be wrong, but then don't say that is not optimization.

kiitos 1 day ago||

Cmd+F "invalidation" -- not found.

Author is talking about the least interesting, and easiest, piece of the overall caching problem.

zmj 2 days ago||

This article is talking about single-writer, single-reader storage. I think it's correct in that context. Most of the hairy problems with caches don't come up until you're multi-writer, multi-reader.

scrubs 2 days ago||

Ousterhout's grad students did work on ramcloud with some research at facebook and Amazon on cache use at scale in complex organizations.

One bit of interesting trivia say for facebook (from memory): if you add all the RAM caches in redis/memcached/disk + db caches to make the thing work at scale, then for about 20-30% more memory you could've had the whole thing in memory 100% of the time.

chrisjj 2 days ago|

The problem there is /the whole thing/ grows - often faster than your memory.

scrubs 1 day ago||

I think you need to read the papers.

Obviously things grow - look who i mentioned afterall.

The focus is on how you handle growth.

Joker_vD 2 days ago||

There is also an important (but often overlooked) detail that you/your application may not be the only user of the cache. At which point caching, indeed, is an optimization via abstraction: when you fetch an X, you are in no position to predict that the next fifty completely unrelated to you requests would also want to fetch the same X, so it should probably be cached to be readily served.

Which is why solving the "I want my data in fast storage as often as possible" problem may be counter-productive on the whole: you ain't the only client of the system; let it breath and server requests from others.

canyp 2 days ago||

Did you hand-draw that graffiti? Never quite realized that graffiti of technical ideas looks really goated. Best part of the post, to be honest.

jxjnskkzxxhx 2 days ago|

> looks really goated

Oof you're trying so hard you could cut diamond with that line.

canyp 2 days ago||

I don't even understand what that means. Care to explain?

the__alchemist 2 days ago||

I think it's a drug reference‽

eigenform 2 days ago|

Even more obvious if you think about the case of hardware-managed caches! The ISA typically exposes some simple cache control instructions (and I guess non-temporal loads/stores?), but apart from that, the actual choice of storage location is abstracted away from you (and your compiler).

More comments...