Top
Best
New

Posted by samuel246 7/1/2025

Caching is an abstraction, not an optimization(buttondown.com)
136 points | 128 commentspage 3
jongjong 7/3/2025|
I was discussing this with someone recently, caching is one of those things that people might do behind the scenes, thinking that it doesn't affect the API but in fact it can create all sorts of issues/complexity.
flufluflufluffy 7/6/2025||
I don’t really know what the point is… there are different kinds of caching that serve different purposes, just like everything else..
pclmulqdq 7/3/2025||
Use of a better abstraction is an optimization, though.
jbverschoor 7/3/2025||
Everything is caching. Almost nothing operates on the target data directly.
necovek 7/4/2025|
Do you think that's a useful definition of the term?

If everything is caching, why even introduce the term: language should help us describe ideas, it should not be superfluous.

jbverschoor 7/4/2025||
Because you can operate directly on data
dasil003 7/4/2025||
What? No, caching means a specific thing: keeping a copy of data away from the source of truth, closer to where you want to read it. Caching always makes systems more complex, it never makes things simpler, and it damn sure doesn't serve as any kind of abstraction unless you're redefining what words mean to indulge your technical philosophizing.
hansvm 7/4/2025|
What if you have to keep some data closer and away from the source of truth though? Given that constraint, TFA argued that other architectures could do the job but that caching functions as an abstraction.
k__ 7/3/2025||
Anything can be an abstraction if designed carefully.
chrisjj 7/4/2025||
Why not both? :)
0xbadcafebee 7/4/2025|
Sometimes posts are so difficult to read they're hard to respond to. I think I get what they're saying. I think they're saying that they think caching should be simple, or at least, that it should be obvious how you should cache in your particular situation such that you don't need things like algorithms. But that argument is kind of nonsense, because really everything in software is an algorithm.

Caching is storing a copy of data in a place or way that it is faster to retrieve than it would be otherwise. Caching is not an abstraction; it is a computer science technique to achieve improved performance.

Caching does not make software simpler. In fact, it always, by necessity, makes software more complex. For example, there are:

  - Routines to look up data in a fast storage medium
  - Routines to retrieve data from a slow storage medium and store them in a fast storage medium
  - Routines to remove the cache if an expiration is reached
  - Routines to remove cache entries if we run out of cache storage
  - Routines to remove the oldest unused cache entry
  - Routines to remove the newest cache entry
  - Routines to store the age of each cache entry access
  - Routines to remove cache entries which have been used the least
  - Routines to remove specific cache entries regardless of age
  - Routines to store data in the cache at the same time as slow storage
  - Routines to store data in cache and only write to slow storage occasionally
  - Routines to clear out the data and get it again on-demand/as necessary
  - Routines to inform other systems about the state of your cache
  - ...and many, many more
Each routine involves a calculation that determines whether the cache will be beneficial. A hit or miss can lead to operations which may add or remove latency, may or may not run into consistency problems, may or may not require remediation. The cache may need to be warmed up, or it may be fine starting cold. Clearing the cache (ex. restarts) may cause such a drastic cascading failure that the system cannot be started again. And there is often a large amount of statistics and analysis needed to optimize a caching strategy.

These are just a few of the considerations of caching. Caching is famously one of the hardest problems in computer science. How caching is implemented, and what it affects, can be very complex, and needs to be considered carefully. If you try to abstract it away, it usually leads to problems. Though if you don't try to abstract it away, it also leads to problems. Because of all of that, abstracting caching away into "general storage engine" is simply impossible in many cases.

Caching also isn't just having data in fast storage. Caching is cheating. You want to provide your data faster than actually works with your normal data storage (or transfer mechanism, etc). So you cheat, by copying it somewhere faster. And you cheat again, by trying to figure out how to look it up fast. And cheat again, by trying to figure out how to deal with its state being ultimately separate from the state of the "real" data in storage.

Basically caching is us trying to be really clever and work around our inherent limitations. But often we're not as smart as we think we are, and our clever cheat can bite us. So my advice is to design your system to work well without caching. You will thank yourself later, when you finally are dealing with the bug bites, and realize you dodged a bullet before.