Posted by samuel246 5 days ago
(Although a materialised view is more like an index than a cache. The view won't expire requiring you to rebuild.)
In RDBMS contexts, index really is a caching mechanism (a cache) managed by the database system (query planner needs to decide when it's best to use one index or another).
But as you note yourself even in these cases where you've got cache management bundled with the database, having too many can slow down (even deadlock) writes so much as the database tries to ensure consistency between these redundant data storage elements.
In some sense though. If it ain't L1 it's storage :)
Even if you use "cache" in the name (eg. memcached), that's still not a cache, even if it's a KV store designed for caching.
If everything is caching, why even introduce the term: language should help us describe ideas, it should not be superfluous.
Caching is storing a copy of data in a place or way that it is faster to retrieve than it would be otherwise. Caching is not an abstraction; it is a computer science technique to achieve improved performance.
Caching does not make software simpler. In fact, it always, by necessity, makes software more complex. For example, there are:
- Routines to look up data in a fast storage medium
- Routines to retrieve data from a slow storage medium and store them in a fast storage medium
- Routines to remove the cache if an expiration is reached
- Routines to remove cache entries if we run out of cache storage
- Routines to remove the oldest unused cache entry
- Routines to remove the newest cache entry
- Routines to store the age of each cache entry access
- Routines to remove cache entries which have been used the least
- Routines to remove specific cache entries regardless of age
- Routines to store data in the cache at the same time as slow storage
- Routines to store data in cache and only write to slow storage occasionally
- Routines to clear out the data and get it again on-demand/as necessary
- Routines to inform other systems about the state of your cache
- ...and many, many more
Each routine involves a calculation that determines whether the cache will be beneficial. A hit or miss can lead to operations which may add or remove latency, may or may not run into consistency problems, may or may not require remediation. The cache may need to be warmed up, or it may be fine starting cold. Clearing the cache (ex. restarts) may cause such a drastic cascading failure that the system cannot be started again. And there is often a large amount of statistics and analysis needed to optimize a caching strategy.These are just a few of the considerations of caching. Caching is famously one of the hardest problems in computer science. How caching is implemented, and what it affects, can be very complex, and needs to be considered carefully. If you try to abstract it away, it usually leads to problems. Though if you don't try to abstract it away, it also leads to problems. Because of all of that, abstracting caching away into "general storage engine" is simply impossible in many cases.
Caching also isn't just having data in fast storage. Caching is cheating. You want to provide your data faster than actually works with your normal data storage (or transfer mechanism, etc). So you cheat, by copying it somewhere faster. And you cheat again, by trying to figure out how to look it up fast. And cheat again, by trying to figure out how to deal with its state being ultimately separate from the state of the "real" data in storage.
Basically caching is us trying to be really clever and work around our inherent limitations. But often we're not as smart as we think we are, and our clever cheat can bite us. So my advice is to design your system to work well without caching. You will thank yourself later, when you finally are dealing with the bug bites, and realize you dodged a bullet before.