Quick Thoughts on Cache Eviction

I read a Hacker News post about how Redis used to evict cache keys before version 6.0:

Every N milliseconds:

Pick a random set of keys (say, 50):

Check how many have expired

If the number is > 25%, then evict and repeat

Otherwise, evict and sleep for N milliseconds

I couldn't find any information that detailed this, although I did find a blog post about the LFU and LRU policies around Redis eviction policies circa 2018. I found something similar to what the author was referencing in the source, although that doesn't seem to exact be it either: https://github.com/redis/redis/blob/5.0/src/evict.c#L531-L550%5Bfn:1%5D.

Nevertheless, this simple algorithm spawned a discussion between me and a friend on how these probabilistic algorithms have fat tails and the assumptions that this algorithm makes. For example, for this simple random eviction policy to work, all the values need to be within a constant range of sizes. If you have a few values that are magnitudes larger than others, at enough scale, you could end up in situations where the random scanning algorithm misses the fat values for multiple cycles, which then puts more pressure on the eviction policy.

My friend then pointed to https://research.google/pubs/pub48030/, a paper that describes some of the issues with the remote caching mechanisms. The paper itself is worth reading, although the problems it recapitulates will be familiar: serialization costs are annoyingly high, protocols add a layer of complexity, and the usage of a K/V model means that you could be attempting to read an entire JSON blob when you just need a single element, and that network latency is non-trivial.

I think the bulk of the problem here comes from the implication that the cache should not be co-located with the caller. The addition of "cache managers" such as Redis Sentinel and distributed Memcache are a concession to this: caches cannot typically be held within a single machine, and that optimal performance requires the cache to understand more than just the actual data: the caching layer must have some additional meta-information on how the caller is planning on using it. The Google paper makes the case for an intrusive, distributed in-process cache that effectively acts as a more innovative remote cache, which can learn from your calling patterns. This isn't too different from Erlang's ETS. The idea is that we can cut down dramatically on serialization and network hop costs by reducing the amount of information read with native language constructs, with which I'm in full agreement.

However, all of this assumes that we can't deploy fat, tall servers where your cache is co-located with your application. The Google paper does not attempt to address this point because it's not feasible for Google, but this type of co-location might be easier than you'd expect. In the past, I've successfully run very tall servers co-located with Kyoto Tycoon and communication over Unix sockets. We're not locked into the single small server model, the fat server model also solves the problems of network hops by removing it and solves the overreading problem by simply assuming that the costs of IPC communication are so low compared to the network hop that the extra data doesn't matter. While this design certainly won't work at a large enough scale, I think it's fairly difficult to run into real-world problems that scale further than this if you're not at FAANG scale.