Designing a Distributed Cache system (HLD Discussion)

5 min readOct 14, 2022

First of all, if the banner doesn’t make sense, blame @OpenAI ‘s Dall.E 2.

In one of my recent 1–1 sessions for System Design Interview Prep, the person wanted to understand how to approach a system design round as that was the first time the person was giving it. While I am no expert by any means, having given and conducted many system design rounds for various orgs, here is the crunch of how I prefer to tackle such questions. (The guy was able to land a job at a major startup).

For eg. we want to design a scalable and distributed cache and the approximate time frame would be around 50 mins.

Requirement Gathering: The first step which would set the tone and structure for the interview. This is the first part of any system design interview, coming up with the features or requirements which the system should support. As an interviewee, you should try to list down all the features you can think of which our system should support. (Time Needed: 2–5mins)

Some definitive questions for this problem would be:

What kind of Cache are we building? : A distributed one.
What would be the eviction strategy? : Least Recently Used (LRU) Cache
What is the access pattern ? Eg : A write-back cache. Which means the writes go to the Cache and are asynchronously written to the DB Layer while the write is confirmed as soon as its written to cache.

Talk about trade-offs , for example, in our case it would be writing to a in-memory (most caches are) datasource has a risk of losing data in case of a device or power failure. But these can be mitigated or lessened by other means like multiple replicas.

2. Back of the envelope calculations (Rough Estimations): In this step you demonstrate your ability to quantify the problem and set precedent why you would take certain decisions while designing the system. For our problem the most important parameters to consider are Query load (Or queries per second) and the data size. Note: ALWAYS inform your assumptions to your interviewer out loud.

Eg. Total Cache Size : 10 TB and the query load to be around 1M QPS (Can ask the interviewer about this)

Production Grade Caching machines can have 72GB or more of RAM. That means we will need around 10*1024/ 72 machines ~ 15 machines.

The query load on each machine = 1/15 M QPS ~ 67k QPS (for a single processor core, note: Most modern servers have multiple cores. Assuming a min of 4 cores, the load per core comes to be ~ 16667 QPS (5–10 minutes)

Here’s a brilliant resource you can use as a cheatsheet for the same

3. Establish the Design guarantees which your solution will provide :

Read first: CAP Theorem : Though your design guarantees doesn’t necessary have to be around the CAP theorem ONLY.

Consistency : Is it ok for us, if we sacrifice some consistency? What if few of the keys aren’t cached? Ans: It should be ok, as long as the majority of data which should have been cached is cached.

Availability : This is non-negotiable as if a cache becomes unavailable the entire query load would be hitting our DBs which can start to create a back pressure.

Partition Tolerance: Is not strictly applicable in our case, as if a cache has an eviction strategy, there would always be data which is not in cache. However, since its a distributed cache, it should be good to have replicas capable of independently working.

Latency: Again this is non negotiable goal as this is the entire reason (along with reducing DB load) for building a cache. (1–3 minutes)

4. Show time: Now you use all the above data and requirement you have gathered and begin to design a system according to your goals. Also, start simple and keep communicating with your interviewer which part he would like to dig deep into.

Eg: Start with a simple LRU cache which would work on a single thread and then if asked move on to change your solution to support multithreading. Do not obsess over details not asked.

Define APIs eg: CacheResponse<T> cacheFetch(String key) and cachePut(String key, String serializedObject) . These would be the entry point to your system and can be used by your application layer. Do note : This is just an example and many other better (automated ways) could exist.
Now it wouldn’t really be a distributed cache without talking about the distributed aspect, would it?

If we only have one machine per shard, then if the machine goes down, all requests to that shard will start hitting the DB increasing latency.
Latency being the whole point of the exercise, we would aim to bring it down.

If we have a lot of machines, one way to avoid these cases would be to have multiple machines per shard where they maintain exactly the same amount of data.
But this would complicate the system and increase our costs many-fold as we would only be using the response from the machine that responds fastest but all machines in the shard would be trying to get the result.
Add to that the concern of maintaining same data across multiple machines or servers. So overall a low efficiency design.

There are few ways we can approach this, the simplest could be:

Master slave technique : There is only one active server at a time in a shard and it has a follower which keeps getting the update. When the master server goes down, the slave server takes over as the master server. Master and slave can sync using a change log based mechanism. Since we decided earlier we are fine with all servers becoming eventually consistent, then we can have one master ( taking all the write traffic ) and multiple slaves servicing the read traffic.

(30–35 minutes)

To summarise:
* Scope the problem & ask as many questions as possible.
* Scope out the scale
* Build what you discussed and nothing more. Add more details when asked.

P.S. Feel free to reach out if you need help with something similar. You can schedule a 1–1 on https://topmate.io/shivendra

Designing a Distributed Cache system (HLD Discussion)

Written by Shivendra Soni