Cache — quick learn
[Cache replacement policies]
We need to kick out some records when we reach cache’s maximum capacity, the 3 basic policies are: random selection, FIFO and LRU.
Random Selection: randomly select some item to be replaced by new items. It’s super simple and surprisingly not performing bad especially all the resources have similar possibilities to be accessed.
FIFO: first in first out, can be implemented as a hash-queue structure. However it does not work well when a certain resource is accessed very frequently (no matter how frequent it is accessed, it’ll be replaced a certain time after it gets into the cache.)
LRU: Least recently used cache, whenever a record is being read/written, it’s order of ‘recently used’ is the highest. When cache is full and new record is coming in, cache kicks out the record with lowest order of ‘recently used’.This strategy makes sure that the most frequent accessed records are always in cache. It can be implemented as hash-double linked list as below.
[Cache sync policies]
Cache as a temporary storage, should be synchronized with the persistent layer. Here are several synchronize policies.
Cache Aside: Application level handles sync between cache and DB. This works fine when cache is belonging to only one application, but when multiple applications are accessing the same cache, similar synchronizing code piece will need to live in each of the applications.
Now lets see other read/write policies, that application is not aware of cache, it only knows it’s accessing some storage, not to mention handling the sync between cache and DB. All the sync are done by cache server itself.
Read Through: Cache server is responsible to call db at a cache miss and update itself.
Write Through: Cache server responsible to update db upon a write request. Write does not finish until db and cache are both successfully updated, this means they are in the same atomic action.
Write Behind: Unlike write through, with this policy, application-cache communication and cache-db sync are handled in different atomic actions. While write through is suitable to strong consistency situations such as transactions, write behind is good for week consistency and it brings high performance.
[Distributed cache]
When we have more data or have more machines to access the same data resource, we’ll need out of box cache to explicitly store those data. Some times, that will be a cache system of cache proxies, multiple cache nodes and cache replica.
By hashing, we could store certain key-val to certain nodes, making access quick.
Cache replica
However, what if node A suddenly crashes? We need replica to avoid single point failure, primary and replica should keep in sync. Note that heavy backup is not necessary, cause cache is meant to be non-persistent.
Consistent Hashing
What if we now want to add a cache node? It will be terrible news that after hash function is changed, all key-val will be re-hashed and moved to it’s new cache node. Consistent hashing makes cache scaling up much easier.
As shown below, imaging hashing as a ring with certain direction. Any key will be stored in the first cache node clockwise to its hash position. In this case (left ring), key = 10, 12, 1 are stored in C, key = 4 is stored in B
If we want to add a new cache node D, and set its hash position like such (middle ring), the only key needs to be rehashed is key = 10 (move from C to D). Similarly, if we want to delete node B, we’ll only need to move keys previously stored in B to A.