Trade-offs in Modern System Design: A Conversation with Archit Agarwal

A pragmatic guide to architecture choices, cost discipline, resilience, and interview-ready thinking.

Feb 26, 2026

This conversation with Archit Agarwal is a practical tour through modern system design—starting from first principles and repeatedly returning to a single constraint: real systems live under trade-offs, and good engineers choose those trade-offs deliberately. Agarwal is a Principal Member of Technical Staff at Oracle, where he works on ultra-low-latency authorization services in Go. He has 11+ years in backend engineering across .NET and Go, and he writes The Weekly Golang Journal, focused on turning system design into usable, operational guidance—especially around performance and efficiency.

He lays out the inflection points that justify splitting—deployment friction, widening blast radius, and the need for truly independent scaling—while emphasizing that flexibility comes with a real operational tax. On cost and resilience, Agarwal makes the same argument from a different angle: engineering decisions should be evaluated as performance per dollar, not performance in isolation. He describes building cost awareness into the design process via observability, explicit cost discussions, and being disciplined about scaling only when needed.

Finally, the conversation shifts from production architecture to interview performance. Agarwal recommends that candidates stand out by aligning on requirements first, surfacing trade-offs explicitly, and communicating clearly enough that the interviewer can follow the “commit history” of their reasoning. He also explains how he expects candidates to handle changing constraints midstream—by absorbing the change, restating it, and selectively updating only the affected parts of the design—while building breadth through fundamentals, real-world problem practice, and a few deep specialties.

You can watch the full conversation below or read on for the complete Q&A transcript.

Emerging Trends and Challenges in System Design

1. We’re seeing this pendulum swing in architecture, with many teams rethinking a pure microservices approach and embracing modular monoliths to reduce complexity and cost. How do you decide when a microservices architecture is truly warranted versus keeping a system design simpler?

Archit Agarwal: To be very honest, this is the first question that I ask myself when I start designing a new system or a new module. And the rule that I follow is very simple: If the problem isn’t complex yet, don’t overengineer it. Start with just a monolith.

New engineers that come into the industry come up with a lot of these buzzwords—event-driven architecture, microservices, serverless. They’re great, but you cannot apply everything in just one go until your application really needs it, right? So that is a key difference between any interview-ready engineer and a genuinely good engineer: a genuinely good engineer would not want to implement everything up front. He would engineer things around the problems that we are facing.

In any early project that you see when you start with a project, the requirements are always changing. You have very little understanding of the domain, right? And the scope is very small. So you should not go into implementing every new buzzword that you see in the industry. You start small, start with a monolith, and design in a way that, in the future, if you want to break that down, you can easily do that, right?

And if your application requires a low latency—for example, if you’re working on a financial kind of system—you cannot live with only microservices. You will have to evaluate if microservices are good for you. Ideally, if you use microservices, there is always going to be additional network hops, and it will be slowing down the system, right? So I would always say that microservices aren’t the magical fix that fixes bad architecture, right? They just distribute that over the network.

So when you start writing your application, start with a monolith and then start understanding if you have the pains where the pain of having the monolith is greater than the pain of splitting it. Ideally, we would have a lot of signals when we can identify whether we should move out of a monolith or not. A few of those signals are: your deployments are getting bigger and slower, you have a larger blast radius on the bugs that you will see, or you need a lot of independent scaling.

For example, if you have a sale for an e-commerce platform, if there is a sale coming up, you would always want your payment-related system to scale larger than your login system, right? So if those are the requirements, you definitely start moving out of a monolith and move into microservices.

And there are a lot of other things. For example, if you need different tech for different problems. If you want to have analytics, you would want to use different technology for that, right? So in a monolith, you cannot have your project written in multiple languages.

So microservices definitely give you flexibility. They also give you headaches, so you should always choose wisely.

2. With cloud spending at an all-time high, there’s sustained CFO scrutiny on engineering decisions. How do you incorporate cost considerations into system design?

Archit Agarwal: Ideally, I would say this is a point where every engineer becomes a philosopher. I remember one quote from—I don’t know where I read it, but it stuck to my mind—and it said that a good engineer would design for performance, but a great engineer would design for performance per dollar.

So any engineer who is thinking about the cost with respect to the performance gain is a great engineer. I didn’t truly understand this quote until one of my family members started one of his startups and I was involved with him in all the tech-related discussions. That was the first time when I realized, OK, when I’m fighting with my manager or my senior manager over using a particular tech, why do they always say no if they don’t need it? And I’m always saying that it will help us scale, right?

That was the first time when I started realizing the importance of why I was denied a lot of requests—because those were not the real pain that I was solving for, right? Trust me, every system will definitely cost something, and you need to understand that no business can keep spending money on something that is not needed at that particular moment.

And to be honest, we had one client—I’ll give you one more instance—where, as a team, we saw great advantage. There was one client who was pushing to reduce the infrastructure cost, and we as engineers, again, we were not doing that. So what he did is he introduced a dashboard where we were seeing per-engineer cost of the infrastructure for the development process. And those numbers were huge per month. And to be very honest, seeing those numbers listed against each person’s name, everyone started evaluating whether to use a particular tech or not.

Like, whether it is really needed, or when you log off from the system, should you shut down your EC2 instances or not, right? That is a huge difference, and in six months, we saw a 20% month-on-month decrease in the infrastructure cost.

So I would say I follow a few principles with that. I don’t prematurely optimize, but I stay observant on the infrastructure. I keep my observability to the extreme so that I can have a dashboard and see where my system is lacking, what part to scale, where I should have improvement. So observability is very important in this perspective.

Then I always design my system for horizontal scaling, but I don’t horizontal scale unless it is needed. Because if you have infrastructure which is of no use, there’s no point spending that money. But you should have that in your infrastructure requirements and your lifecycle.

For example, if you are using an S3 bucket and now you have 100 GB of data there which is ideally not being used for months—or will never be used—why do you want to spend money on live data there? You should push it out to cold storage and spend less on that data which is practically not being used.

Then, into the technical conversation: for every story that we start designing, we have design discussions. In the design discussion, we would try and include the costing. At times we see that engineers come up and say that they’ll reduce the latency by 10%, but to reduce the latency, they’re increasing the cost of the infrastructure by two times or three times.

So then the question is again on the engineer: Do we really want to improve the latency with the high cost? If it is really needed, we are OK to spend, right? But if it is not really giving any advantage to the user—of that 10% decrease in your response time—by spending that great amount of infrastructure cost, this makes the team aware that performance without cost awareness is just expensive engineering. So you should not just keep adding to infrastructure cost every now and then.

3. Modern systems are facing record-breaking DDoS attacks and increasingly complex supply-chain threats. For instance, 2025 saw hypervolumetric DDoS attacks peaking at multi-terabit levels and a 188% year-over-year spike in malicious packages in open-source registries. How do you design systems to be resilient against such attacks and vulnerabilities that are increasing exponentially?

Archit Agarwal: In today’s world, I don’t design things thinking that I’ll not get attacked. I always design thinking that I’ll always be attacked, and how would I react when I’m attacked?

Modern systems are operating in very hostile environments. So you should always assume two things: the system will fail—that is for sure, that is inevitable—and then you’ll definitely get attacked now or in the near future. So if you plan your infrastructure and your architecture based on these two assumptions, you’re making good decisions to protect your system against these two things.

Once you accept these things, you can reduce the blast radius of these two things because now you are aware. So how do you do that? There are a couple of things that we start with.

First part is always a layered defense, where you start with your network layer. In your network layer is the first thing—your first defense layer—to protect yourself against any attack or anything. So you can use services that are given by the cloud provider. For example, AWS has a service that is called AWS Shielded Advanced. You can use that. Azure has a service. Google Cloud has a service. Every major cloud provider will have some service to protect with the network layers—you start using that.

Then in your application layer, you start adding code for limiting the request. For example, you start implementing rate limiters based on the geolocation, or IP, or user. Maybe you say that if a user is making more than 100 requests per minute, he’s probably trying to attack my system, because that’s not an ideal flow of a user to call my system 100 times in a minute. So we’ll block that user.

And maybe some bot-type of protection. For example, Google has a bot which crawls to every web page and collects the data for optimizing the search results. But Google’s bot makes sure that it is not overloading the server with a lot of requests to crawl the data. But there are bots that people write—bots that are made to overcrowd your server and keep collecting the data so that they can do some added advantage to themselves with the data that they collect from you. So you should write your application layer to protect yourself against such bugs.

Then your architecture has to have an upper limit on your auto scaling. So you cannot keep auto scaling to 100 servers for one service, right? Because if you’re scaling to that extent, that means there is some malicious activity going on your server suddenly. So you should always have an upper limit. Auto scaling is great until you realize that you’re auto scaling your DoS attacks.

Then the second thing on your defense would be having resiliency principles. For example, if you have a bigger application, you would always deploy it into multiple availability zones. Why? So that if one data center is under attack, you can completely shut down your service deployed on that data center, but still have your application up and running for users because your services are again in different data center—or maybe go multi-region.

Or these days, you can even go multi-cloud, but multi-cloud is not easy. You will have to consider a lot of things around multi-cloud.

Then is your supply-chain security. These days, modern applications are dependent on a lot of external services, so you need to make sure that whatever service version that you are using, you have already validated the service for the security risk—and you are not auto-upgrading until you validate it—because those dependent services are the actual surface area that you are exposing to the attacker. That is the service area—now you can start attacking on the service area. So that is the next thing that you look at.

Then you apply security by authorization, and by authorization you would always do a deny-by-default. You don’t say that I will allow everyone unless he has this role. No—you say that everyone is denied unless they have this particular access. So then you protect yourself.

Then your token should be short-lived. You don’t ideally create tokens that are living forever, right? So that even if the attacker has access to the token, he is only having access for a particular duration. He loses the access after the token has expired.

Then observability is the key. You should always have observability on your systems. You should never miss out observability and logging so that you don’t have visibility on things.

4. Today’s architectures often depend on numerous third-party services and cloud providers, even if not by explicit choice. How do you design a system that remains portable and robust when you’re relying on external SaaS APIs, cloud services, or even multiple cloud environments?

Archit Agarwal: I was expecting that question with all the recent AWS, Azure, Cloudflare outages that have been going on in recent months. And to be honest, every system depends on a lot of different external services—for example, your database, all your messaging queue, your SaaS APIs—all of these are external dependencies. And you cannot create an application in these modern days without having dependencies on at least one of them.

So I would say multi-cloud is not always feasible because it has its own challenges. There are business challenges, there would be some data-related privacy challenges, and you have cost challenges definitely—because if you have multi-cloud, you will have a lot of huge costs that you will have to invest.

So ideally, we don’t design to avoid dependency. We design so that if one dependency creates a failure, the whole system is not down. That is the core intention of designing things. There are a few principles that we usually follow, and I think most engineers would agree.

We have an abstract layer for each external service. For example, if you are talking to a storage service, we have an interface through which our application will talk. Now this interface can any day go ahead and update the dependent service and say that today I’m talking to AWS, tomorrow I’ll go ahead and talk to Azure. So it would be easier for us to keep switching the external dependencies without impacting our actual application. So this is decoupling the application from the external dependency.

Then we can use open standards and some cloud-neutral tools. Standard as in containerization, Kubernetes, telemetry; use some databases that are open-ended—for example, Postgres, MongoDB. And for cloud-neutral tools, you can go ahead with using Terraform, where you can deploy to different cloud providers any day—you can choose between any.

Single region is a single point of failure, and single cloud can also be there, but you will have to be cost-smart on using multi-cloud. You need to make sure that your disaster recovery model is in place. You don’t replicate all the services to different cloud. Only replicate the mission-critical services to different clouds so that your users don’t have impact on their daily very important critical task—but some tasks can still be offline for some time and it’s still OK for them.

You’ll have to plan that, and then unified observability. You cannot have observability divided over different cloud or different region. You should have one single place to look at logs, traces, and everything so that you don’t do the guesswork. You have a curated list of everything at one place.

Practical Architecture Insights from Experience

5. You personally have experience building ultra-low-latency services, such as global authentication systems. What design principles and techniques are crucial for achieving sub-millisecond latency at scale?

Archit Agarwal: Ultra-low-latency systems look very simple from outside, but they’re a totally different type of structure that we are building. So I treat latency as the monthly budget that you have. Now, every network hop or any memory allocation that you do will take something out from that budget, so you will have to be very smart in choosing where to spend.

So you don’t ideally optimize for speed—you eliminate whatever is slow. Start eliminating whatever is slow. So there are a few key principles that I usually follow, and I try pushing my team to follow those.

One is: move the computation closer to the user. So your computation layer should be closer, or deployed into the edge location where the user is trying to access from. So let’s say I’m living in Bangalore and I’m trying to connect to a server sitting in the USA—I will have a lot of latency, right? So do that: fix the compute layer closer to the user.

Then avoid network hops completely in those hot parts where you want ultra-low latency. You cannot have network hops to different microservices. You always use in-memory everything. You don’t go to a distributed cache, you don’t rely on some other network server—because, again, you’re reducing the network hops.

Then you keep your service lean. You don’t use a lot of wrappers. For example, if you are using wrappers, those wrappers—finally—convert that into the native code only, right? So I would always recommend: remove those wrappers and directly communicate in the native language to the machine. That will improve the performance and reduce the latency time on your server.

Then improving your network layers—for example, reusing the HTTP connections will help. So you don’t really initialize HTTP connections again and again on your system. Then using the right protocols—so if your service-to-service communication you’re using maybe HTTP, it’s not good. You can use gRPC. gRPC is way faster than HTTP in service-to-service communication, so you choose that.

And then the last part is always the right hardware and the runtime that you’re running on. If your hardware is too old, too laggy, there is nothing that can solve the problem. You will have to fix the hardware also.

6: If I asked you to summarize briefly, how do you ensure that pushing for extreme performance doesn’t compromise reliability or maintainability?

Archit Agarwal: Ideally, what I’ve observed in my experience till now is that, in an application, not more than 5% of the application actually requires that ultra-low latency. The 95% of the application is still OK with having a little more latency on that side.

So you only should optimize on that 5% which actually requires ultra-low latency. You cannot develop an application where everything is designed for ultra-low latency. So that 95%—I would always say—design it for readability and maintainability. But for the 5% which requires low latency, there we can still compromise on the readability and improve the latency there.

Cracking the System Design Interview

7. System design questions are broad and open-ended, and probably that’s why they’re challenging. Do you recommend using any kind of structured approach or framework to tackle these interviews?

Archit Agarwal: System design interviews are not about memorizing a particular framework. It’s about thinking in a framework. Having a framework will never have a bad impact—it will only help you because now you are more calm, and you’re approaching the problem in a structured way without using buzzwords very initially in the conversation.

I’ve seen a lot of engineers come in to a system design interview and, as soon as I give a problem—let’s say, “design this system”—they start with, “let’s use microservices,” and start using distributed cache. But they didn’t understand what scale I want the system to be in. And when I asked, “How many users are you planning on this system?” they would ideally say 1,000 users or 10,000 users in a minute. But is that really needed? Is that really what I wanted? That’s not in alignment.

So I would always say: start with one to two minutes of quick alignment with the interviewer. Try and gather the functional requirement, where you basically get answers to two main questions: What are we actually building, and what does the user actually need? By this, you will understand what the database model is—whether the system is read-heavy, write-heavy, what type of system it is. Then you go into nonfunctional requirements. Now, nonfunctional requirements are the ones that actually drive the architecture.

So in nonfunctional requirements, you ideally collect data around the number of requests that you are planning on, the scale at which you are operating, the consistency that you are looking for, or is there any latency requirement there. Nonfunctional requirements are the ones that decide the architecture—not the other way around.

So yeah, I would say: consider the system design interview as two engineers discussing a problem. It should not be like you are getting interrogated by the other person. If you are asking the right questions in the initial one to two minutes, you have already impressed the interviewer. He’s already giving all the ears to you now—he’s listening to the conversation, and he’s also interested in giving his thoughts on that. After doing all this, now you can move to high-level design and get into the different parts of it.

8. So according to you, how should candidates break down a complex design problem during an interview to ensure they cover all important aspects? I know part of it is asking those questions, but what else?

Archit Agarwal: Basically, when it comes to a system design, you should try and break that complex system into smaller pieces and then go to the high-level design.

So once you have got those questions answered—basically functional and nonfunctional requirements—then you start by introducing a very high-level design diagram, and then you start zooming into one piece at a time. For example, you have given the high-level architecture where you say that there is a user who is making a request to the API server. Which request goes to the service, and then the service makes the call to the database or maybe the caching layer, and the response is sent back. That’s a very high-level architecture that you have.

Now you start zooming in: What type of API gateway? Do you need a load balancer? Do you need multi-region deployment? And all these are answers that you have already collected from the nonfunctional and functional requirements—and this is how you start introducing your thought process.

And in this process, when you are trying to zoom into each piece, what you do is, ideally, you start discussing the trade-offs. For example, when you talk about database, you say, “I’m using a relational database.” Why are you using a relational database? Why not NoSQL? That is a trade-off that you should introduce in your conversation. Then why are you using EC2, not a Lambda service, right? So all these trade-offs are something that you start discussing, because system design, ideally, is about discussing the trade-offs.

So if you know the trade-offs—why you’re using a particular thing over the other—you have already made progress where the interviewer knows that this person knows things well. He knows his choices. He understands why to and when to make a choice.

So by this time, he will be very confident that this guy will be able to design an application which is operating at a Google scale. Maybe the application is as simple as a to-do application, but he will be able to take it to the scale level that we want.

9. And if you turn the lens inwards a bit from your perspective as a system design interviewer, what is your process for evaluating a candidate’s depth versus breadth?

Archit Agarwal: So honestly, a system design interview is not about the diagram and memorized architecture. It’s about building a thinking muscle more, right? Most people try to study system design like a subject, but I would say: think of system design as a skill that you are adding to your bucket, right? It’s a skill you need to improve with structured and deliberate practice. Start with strong fundamentals—that’s what we just discussed, right? You should have strong fundamentals.

Then start practicing mock interviews. Take help of some person—maybe a mentor or a friend—who can sit down with you. You start designing one system design problem. For example, start with a URL shortener. Start discussing it with your friend or a mentor. And try to form a complete framework where you say that first, in any system design, I’ll get these things answered; then I’ll go to this part; and then I’ll go to this part. Try and do your system design practice in that particular framework so that you are very comfortable.

Be comfortable with the framework itself. You should not memorize the questions that you have to put in, because the questions will keep changing based on the system. But the framework should be good enough so that you have easy traversal through the problem, and it is easy for you to travel there.

Then work backward in a real-time system. So what I usually do is, I question myself on a few systems. For example, if we are using WhatsApp—everyone uses WhatsApp mostly, right?—so I would think about how WhatsApp is able to scale the messaging server. And now I will start exploring articles, blogs, engineering blogs around it, and start understanding how we can do that, right? Or maybe how Netflix is able to scale the streaming globally. That’s a complete different engineering challenge. How is Netflix able to do it? So start backward, think about the system, and then start researching about it.

Then start building things. So then you start building things—and maybe you don’t do it at a global scale, but at least when you start building, you will understand the challenges around latency, or maybe race conditions, or all those constraints that you think about, right? You start feeling that, and you start solving that.

And then the last part is definitely: learn to communicate. Because if you don’t learn to communicate system design interviews, you’ll not be able to excel there.

10. But do you recommend any specific resources, books, or specific real-world exercises for mastering system design concepts and being interview-ready—especially for senior engineers aiming to showcase their expertise?

Archit Agarwal: See, for someone who is aiming for a senior role, I would definitely suggest a mix of a few things—starting from a book, real-world blogs, and then real-world exercises.

So for books, I would recommend you should definitely read Designing a Data Intensive Application by Martin. That is a must-read book for any senior engineer who is aiming to excel in system design. Then there are books like System Design Interview, Volume One and Volume 2 by Alex Liu, right? Those two are very good books. Then Building a Microservice by Sam Newman.

So those are a few very good books that have been written. And if you read those books, you’ll get a lot of understanding on system design. Then you can refer to some engineering blogs by big tech giants. For example, Netflix has an engineering blog. Uber has an engineering blog—and all those big tech giants who are into technical space, and they have a big tech infrastructure that they maintain, they always have engineering blogs. Go refer and read those blogs. Go to high-quality YouTube channels where they’re not just discussing the diagram—they’re discussing the concept, more depth into the concept. So refer those channels, in case you want.

And then finally is designing a system which is time-tested, scale-ready, and you have done that. So system design interviews isn’t cracking by memorizing some answers. They’re cracked by building strong foundation, real practicing problems, and then thinking like an engineer, not an exam candidate.

11. Even experienced engineers can stumble in design interviews. What are some of the most common mistakes or pitfalls you see candidates make—especially when they’re quite experienced and perhaps more confident than some others—and how can engineers avoid these mistakes?

Archit Agarwal: System design interviews are funny because people don’t fail because they don’t know what Kafka is, or maybe DynamoDB. They fail because of the way they communicate with the interviewer.

So I would say that if you’re having good communication—and you’re establishing that communication and having a two-way communication with the interviewer—that’s half of the job that is already done. I’ve seen engineers who jump directly into solutions as soon as they listen to a problem where—let’s say I say, “design this system”—and they would start saying, “I’ll use Redis, I’ll use Kafka.” I would say, slow down. First, understand the scale constraints. For example, how many requests per second are we operating at, or how much data are we expecting per day flowing in the system? Or is there a security requirement?

For example, if you’re operating in a European country, you have different compliance on the personal identifiable information than in other countries, right? So you should start asking those constraints first and then start coming to a conclusion and architecting things, right?

And you probably don’t need to design at Google scale everything. It doesn’t have to scale to Google, right? There are things that are defined for small scale only. For example, let’s say there is an application that I want to design that is only to be used by my company’s engineers—it doesn’t have to go outside that. So why do I need multi-region deployment? I can do a local area network deployment and live with it, right? I don’t even need cloud there.

So those problems you need to understand. Then if you understand how many requests, how many servers would you need, or how big a database do you need, right? So if you start addressing those basic questions, I think you are already sorted and you are on the right track on that.

12. Have you ever seen a case where the interviewee has asked too many questions? Has that ever happened?

Archit Agarwal: Yeah, I have once seen one interviewee who was asking too many questions, and that particularly gave me an idea that the question that I have probably asked him is something that he’s not aware of.

For example, I gave him a system. He didn’t have any idea about the system. He’s never thought about that. He might be using that every now and then, but he has not given it a thought. But it is OK. Let’s say if I’m interviewing a very junior engineer, he might not have thought about a lot of things by then, and if he’s asking too many questions, it is still OK.

But if he’s asking questions that are very small, and I think those are very basic for that particular level of engineer, then it raises a red flag. But asking clarification questions is perfectly OK.

13. Now, as you’ve also said, a system design interview isn’t just about the final answer, right? It’s about how you communicate, how you adapt to the constraints you’ve sort of discovered during the conversation. Interviewers often value a candidate’s ability to clearly explain their thinking and reasoning—and the ability to adjust to constraints that are put in front of them mid-discussion, even. So in this context, how important are communication skills in these interviews, and what does good communication look like for a system design question?

Archit Agarwal: OK—so, honestly, communication is half of your system design interview. Or maybe it can be more. Let’s say if I am capable of designing a beautiful architecture in my head and I’m not able to communicate or explain it to the other person, the interviewer will see that architecture doesn’t even exist for them, right? Because you were not able to explain it to them.

So I have seen candidates who design very solid system design architecture, but they were either too quiet, or used too many jargons, or were too scattered in explaining the information. And in a system design interview, it is about how you communicate and explain to the other person the architecture that you are thinking about, because that gives insight into whether this person will be able to work with a team of architects, product managers, and junior engineers—whether they’ll be able to explain what they’re thinking. The system design interview is also intended to understand your communication skill as well.

On the technical side, there are a few things that I always suggest to everyone. Think out loud. You should not be silent for, let’s say, five minutes and you’re just thinking about the system. Start speaking whatever you are thinking. People need to know your brain’s commit history, basically—whatever you are thinking.

So maybe you are saying that, “I’m choosing this approach because of this thing,” or “given that this is the scale at which we are operating, this option makes more sense.” Start communicating your ideas. Maybe you are not communicating the right thing, which is good for the system—but once you communicate, when you read out loud your idea, you will automatically make more sense and you’ll auto-correct yourself, and it is perfectly OK if you’re auto-correcting yourself.

The interview should not feel like a monologue where you’re just speaking and the other person is listening. Because trust me, if that is happening, you should get the indication that you have already lost the session. So to do that, you will have to start structuring your answers. Basically, what you say is important, but how you say it is more important than that, right? So a good candidate would break the answer into multiple steps. Summarize things. Occasionally, start transitions—like, “Now I would go into, I would start discussing the data flow,” “Let’s start discussing the caching strategies,” these kinds of things.

Check if the interviewer is aligned to your communication or the approach that you are trying to follow, and make that interviewer feel that they are sitting with another engineer who is trying to collaborate and bring up a good system. That’s the intent that they want to see.

Your things that you say should not be meant to impress them. You are not there to impress them with a large amount of jargons that you say, or big words. You should be very clear, concise, and make sure that your communication is so clear that even if the other person is very junior to you, they can still understand. That’s the core of communication, right? Your communication should not only travel up the ladder; it should also travel down the ladder when you’re communicating.

Then listening is another advantage that you’ll have. If you’re not listening to the interviewer, you’ll not be able to respond to the feedbacks that they want to implement—or maybe you’ll not be able to adopt whatever they’re giving as feedback. So you should always try listening more to the feedbacks that the interviewer has.

14. Some really excellent tips there, Archit. But what happens if an interviewer throws a curveball—say, suddenly the constraints change? You’ve sort of thought it through really well. You’re in the flow, you know you’re doing really well, the goal is almost in sight—but this new constraint or change in scope is just thrown at you. So what’s the best way to handle this kind of situation?

Archit Agarwal: To be honest, I love when an interviewer throws these curveballs. Now, why? Definitely they’re not easy. When you are into the system design, you are halfway through and you’re almost there, and something changes—it’s really frustrating.

But, to be honest, that’s the real-world scenario, right? You’re always designing things, and suddenly things will always change. Your actual world is also in that same sense. So if you are not able to adopt, then there’s no point designing architecture, right? So if an interviewer is giving you a curveball, think about it as a chance for you to showcase your adaptability according to the changing scenarios.

So here is how I would ideally approach it. I would not panic, and I would not go ahead and start defending my original diagram, right? I would first absorb what they’ve mentioned and then say, “OK, this changes these things. Now let me think about how we can adopt to this.” Now this gives the other person a hint that Archit is flexible and he’s not egoistic on his design approach, which is one good sign.

Then I would restate whatever they have mentioned to make sure that we are aligned on the same requirement change that we have seen. I’ll always reiterate in my own words, right?

Then the third thing that I’ll do is start highlighting what part of the system will have to undergo changes and what part will remain intact. This also gives a very clear understanding whether I’m able to structure the redesign approach—understand what part of the system still can be the same and doesn’t have to.

The curveballs that the interviewer gives you—the changes that the interviewer gives you—will never be in a way that you will have to scrap the complete diagram, the complete architecture, unless you were already off the track, right? They want to understand: how do you plan what part of the system can remain as it is and what part of the system can change, and how flexible is your system to changes.

And if there is something that is complex, be honest. No one expects you to have knowledge on everything. So if there is something that is complex, think that you are in a two-way communication with an engineer. You can start speaking about it. If this is a complex thing, you can say that this is a bit complex and these are the trade-offs that we’ll have to make—and try and include the interviewer in your communication in those things.

So this is how you will succeed. System design interviews are not about being right all the time. They’re about how clearly you can think, how well you can explain, and how gracefully you can handle the changes.

15. Candidates are expected to know advanced concepts that used to be considered niche, and this continues on very well from what you were just saying just now. So, for example, in a scenario like designing a location-based service, it may be assumed that you have knowledge of geohashing or spatial indexes. So how should candidates prepare for this breadth-of-knowledge challenge that has sort of become more and more expected?

Archit Agarwal: To be honest, the bar is definitely raised. Now, once the things that were termed as “nice to know” are something that are considered that you should know with the same experience level. So I would not deny that fact, but here is the thing: I don’t think a candidate needs to be an encyclopedia on that side. If they are an intentional learner, it’s good enough—because no one can ideally learn everything. Tech space is too big for all that right now. There are a lot of things in tech space. No one can learn everything.

But having said that, in an interview, if you’re getting some question that is out of your league, you definitely will panic. So how I approach my learning and catering to those things nowadays is having four layers in your preparation module.

First thing: build extremely strong fundamentals. Your fundamentals are extremely important because any advanced topic you can term right now has always been starting from a basic system. There was a basic system which had some issues—that’s why this advanced system was innovated, right? So if you know the basics well—for example, you know how a database works, or how indexing in the database works—how can a distributed system fail, or what are the different consistency models, right? If you know these basics, it is more than enough for you to start establishing your knowledge in those advanced topics. So make sure that your fundamentals are very clear.

Then learn the advanced topics through real problems. I would not just go ahead and keep reading articles or books around those advanced topics. I would just say: let’s say I want to start understanding geohashing—so I would not just read about it; I would design a food delivery app to understand geohashing. If someone says that I want to understand Kafka semantics, just don’t read about it. Start defining or designing a real-time analytics system where you include this topic, and that’s how you will deepen your knowledge in these areas.

Now after all this, pick up two to three areas where you will go deep. Because personally, I believe you should have deep knowledge in one or two areas at least, because when you go into an interview, the depth of the knowledge is directly reflected—because that topic you will be speaking more, right? And trust me, any engineer who is interviewing you, if you go deep into one particular topic, they understand that this is some area that you are more interested in. And if you’ve gone to that depth, that means you are already an engineer who understands the gravity of things. So you can maybe think about systems that you can go deep into—like, for example, a distributed system, or a storage system, authentication system, or maybe go deep into performance engineering.

Then practice is important. Practice articulating how you can discuss the trade-offs. Maybe ask a friend to sit with you and talk to them on the trade-offs. So once you start communicating and your friend gives you feedback, you will start improving your communication skills on the discussion of those trade-offs. So that is the fourth thing that is very important.

16. If you turn the lens inwards a bit from your perspective as a system design interviewer, what is your process for evaluating a candidate’s depth versus breadth?

Archit Agarwal: Honestly, a system design interview is not about the diagram and memorized architecture. It’s about building a thinking muscle more, right? Most people try to study system design like a subject, but I would say: think of system design as a skill that you are adding to your bucket. It’s a skill you need to improve with structured and deliberate practice. Start with strong fundamentals—that’s what we just discussed. You should have strong fundamentals.

Then work backward in a real-time system. What I usually do is I question myself on a few systems. For example, if we are using WhatsApp—everyone uses WhatsApp mostly—so I would think about how WhatsApp is able to scale the messaging server. And now I will start exploring articles, blogs, engineering blogs around it, and start understanding how we can do that. Or maybe how Netflix is able to scale the streaming globally—that’s a completely different engineering challenge. How is Netflix able to do it? So start backward, think about the system, and then start researching about it.

Then start building things. Maybe you don’t do it at a global scale, but at least when you start building, you will understand the challenges around latency, or maybe race conditions, or all those constraints that you think about. You start feeling that, and you start solving that.

And then the last part is definitely: learn to communicate. Because if you don’t learn to communicate system design interviews, you’ll not be able to excel there.

17. Do you recommend any specific resources, books, or specific real-world exercises for mastering system design concepts and being interview-ready—especially for senior engineers aiming to showcase their expertise?

Archit Agarwal: See, for someone who is aiming for a senior role, I would definitely suggest a mix of a few things—starting from a book, real-world blogs, and then real-world exercises. For books, I would recommend you should definitely read Designing Data-Intensive Applications by Martin. That is a must-read book for any senior engineer who is aiming to excel in system design. Then there are books like System Design Interview, Volume One and Volume 2 by Alex Liu. Those two are very good books. Then Building a Microservice by Sam Newman.

So those are a few very good books that have been written, and if you read those books, you’ll get a lot of understanding on system design. Then you can refer to some engineering blogs by big tech giants. For example, Netflix has an engineering blog. Uber has an engineering blog—and all those big tech giants who are in the technical space and have big tech infrastructure that they maintain, they always have engineering blogs. Go refer and read those blogs. Go to high-quality YouTube channels where they’re not just discussing the diagram—they’re discussing the concept, more depth into the concept. So refer to those channels, in case you want.

And then finally is designing a system which is time-tested, scale-ready, and you have done that. So system design interviews isn’t cracked by memorizing some answers. They’re cracked by building strong foundations, really practicing problems, and then thinking like an engineer, not an exam candidate.

Discussion about this post

Ready for more?