Raul Junco recently shared something that caught my attention: "Most system design interviews don’t fail because of missing knowledge. They fail because people can’t recall the basics under pressure." That line is painfully accurate, and it explains why so many otherwise-strong engineers walk out of interviews thinking, "I knew that... why couldn’t I say it?"

Raul also pointed out that these are not trick questions. They are "the muscle memory of backend engineering" - the core concepts you need to recall automatically when the clock is ticking and the whiteboard is staring back.

In this post, I want to expand on Raul’s list and turn it into a practical guide you can use to build that muscle memory. Not to memorize trivia, but to internalize the decisions and trade-offs that show up in real systems and real interviews.

"When design questions come up, these concepts should feel automatic, like checking mirrors before changing lanes." - Raul Junco

Why fundamentals fail under pressure

In system design interviews, the difficulty is rarely the concept itself. It is the combination of:

Time pressure and constant context switching
Fear of missing something obvious
Ambiguity in requirements
A need to narrate your thinking clearly

Under stress, your brain reaches for the simplest available pattern. If the fundamentals are not well-rehearsed, you either freeze or you overcomplicate.

A useful way to think about Raul’s point is: interviews test retrieval, not just understanding. You might understand sharding, retries, and queues, but if you cannot retrieve the right model quickly, you will not apply it well.

Scaling up vs scaling out

The core distinction

Scale up (vertical): bigger machine, more CPU/RAM, often simpler operationally.
Scale out (horizontal): more machines, requires distribution (load balancing, coordination, partitioning).

What interviewers want

They want to see that you can:

Identify the bottleneck (CPU, memory, disk IO, network, external dependency)
Choose the simplest scaling approach first
Recognize when horizontal scaling becomes necessary

A crisp interview answer sounds like: "We can start by scaling up the database instance to buy time. Once write throughput or storage growth exceeds a single node, we move to scaling out via read replicas and then partitioning." That shows sequencing and realism.

Key habit: always state what breaks first when you scale.

SQL vs NoSQL trade-offs

This is not "SQL good, NoSQL fast." The real trade-offs are about consistency, query patterns, and operational complexity.

SQL is great when:

You need relational integrity and complex joins
You need flexible ad hoc querying
Transactions matter (money, inventory, state transitions)

NoSQL is great when:

Your access pattern is predictable (key-value, document fetch, time series)
You need easy horizontal scaling for specific workloads
You can design around denormalization and limited joins

In interviews, define the data model and access patterns before naming a database. For example: "If we mostly read and write by userId and we can tolerate eventual consistency for feed ranking, a wide-column or document store could fit. If we need strong constraints for billing, I would keep that in SQL." That framing wins.

Queues vs pub/sub

Raul included "queues vs pub/sub" because it shows up everywhere: background jobs, notifications, analytics, payments, and fan-out.

Queues (work queues)

Goal: process every task (usually once) by one consumer
Consumers compete for messages
You care about retries, dead-letter queues, and backpressure

Examples: image resizing jobs, email sending, payment reconciliation.

Pub/sub (event distribution)

Goal: broadcast events to multiple independent consumers
Each subscription gets its own copy
You care about consumer isolation and ordering guarantees

Examples: "UserSignedUp" event feeding analytics, CRM sync, and onboarding emails.

Interview tip: say what the message represents.

"This is a command" (do X) often belongs in a queue.
"This is an event" (X happened) often belongs in pub/sub.

Idempotency, retries, and failure modes

This is where many designs fall apart. Distributed systems fail in normal ways: timeouts, partial writes, duplicate deliveries, and stale reads.

Idempotency in one sentence

An operation is idempotent if doing it twice has the same effect as doing it once.

Why it matters: retries are unavoidable. If your client times out after charging a card, it will retry. If your server processes twice, you just double-charged.

Practical patterns

Idempotency keys: client sends a unique key for a request, server stores result keyed by (user, key).
Deduplication tables: keep a record of processed message IDs.
Upserts and conditional writes: "set status to PAID if currently PENDING".

Failure modes to name explicitly

At-least-once delivery: duplicates happen, design for idempotency.
At-most-once: you might lose messages, acceptable only for best-effort.
Exactly-once: usually an illusion at system boundaries (more on that below).

Interview habit: every time you add a retry, say out loud what happens if the first attempt actually succeeded.

Sharding, caching, and rate limiting

Raul grouped these because they are your core levers for scale and reliability.

Sharding (partitioning data)

Sharding is how you scale a datastore when one node cannot handle the load.

Choose a shard key that matches access patterns.
Watch out for hot shards (celebrity users, trending topics).
Plan rebalancing: consistent hashing, virtual nodes, or managed solutions.

A good interview move: mention the "shard key regret" problem and propose mitigations (composite keys, indirection layers, or a routing service).

Caching

Caching is not just "add Redis." It is answering:

What are you caching (objects, query results, rendered pages)?
Where (client, CDN, service cache, DB cache)?
Invalidation strategy (TTL, write-through, write-back, explicit busting)?

If you do not say how the cache stays correct, your design looks hand-wavy.

Rate limiting

Rate limiting protects availability and controls costs.

Per user, per IP, per API key, per tenant
Token bucket and leaky bucket are common algorithms
Decide where it lives: API gateway, edge, or application layer

Interview tip: connect it to abuse and to cascading failures. Rate limiting is often a reliability feature, not just a security feature.

Fan-out strategies and the "exactly-once" myth

Fan-out comes up in feeds, notifications, and real-time systems.

Two common feed strategies

Fan-out on write: when someone posts, push the post into followers’ inboxes. Fast reads, expensive writes.
Fan-out on read: compute the feed when a user opens it. Cheaper writes, potentially expensive reads.

A strong answer often proposes a hybrid:

Fan-out on write for normal users
Fan-out on read for very large accounts to avoid write amplification

Exactly-once is usually marketing

In practice, "exactly-once" across networks and external systems is extremely hard. What you can do is:

Achieve effectively-once semantics via idempotency
Use transactional outbox patterns for reliable event publishing
Accept duplicates and design consumers to dedupe

If you say "we guarantee exactly-once" without explaining how, it can be a red flag. Raul’s phrase "exactly-once myths" is a reminder to be honest and precise.

A simple way to build Raul’s "muscle memory"

Here is the exercise I use (and I think it matches Raul’s intent to "save it, revisit it, use it to stress-test"):

Pick one concept per day (for example, queues vs pub/sub).
Write a 5-sentence explanation from memory.
Add one real example (payments retry, feed fan-out, cache invalidation).
List 2 failure modes and how you mitigate them.
Explain it out loud in under 60 seconds.

Do that for two weeks and you will notice something: the fundamentals start showing up automatically in your designs. You stop scrambling for vocabulary and start making clear trade-offs.

Closing thought

Raul Junco’s main point is not "learn more." It is "recall better under pressure." The difference between a shaky and a strong system design interview is often whether foundational trade-offs are automatic.

Fundamentals decide everything, especially when time is tight.

This blog post expands on a viral LinkedIn post by Raul Junco, Simplifying System Design. View the original LinkedIn post →

Raul Junco on System Design Muscle Memory

Grow your LinkedIn to the next level.