
Nikki Siapno's 25 System Design Concepts Worth Studying
A practical guide inspired by Nikki Siapno's viral list of 25 system design concepts, plus what to learn next and why.
Nikki Siapno recently shared something that caught my attention: "I wrote 25 articles for 25 system design concepts:" followed by a rapid-fire list that included "JWT," "Idempotency," "Rate limiting," "Observability," "Microservices," and "CI/CD pipelines." Then she asked the perfect follow-up: "What else would you add? What concepts would you like me to cover?"
That post format works because it matches how most of us actually learn system design. Not in one sitting, and not from a single diagram. We learn by collecting durable concepts, revisiting them in different contexts, and gradually building an internal toolbox we can apply in interviews and in production.
Below, I want to expand on Nikki's list and make it more usable as a study plan: what each concept is really for, how they connect, and what I would add next if you're building a complete system design foundation.
"I wrote 25 articles for 25 system design concepts" is more than a flex. It is a curriculum outline.
Why a concept list beats random system design practice
When people "practice system design," they often jump straight into prompts like "design Uber" or "design YouTube." Those exercises are useful, but they are also overwhelming if you do not have the underlying building blocks.
Nikki's approach flips that. You master primitives (like rate limiting, consistent hashing, CDC, connection pooling), then you can assemble them into systems with confidence.
A good mental model is:
- Concepts are the pieces (rate limiting, caching, message queues).
- Quality attributes are the goals (reliability, latency, security).
- Architecture is the arrangement (microservices, gateways, load balancers).
- Operations is how it stays healthy (observability, health checks).
The 25 concepts, grouped into a practical map
Nikki listed 25 items. To make them easier to remember and apply, I group them into six buckets.
1) Identity, security, and trust boundaries
- JWT
- HTTPS
JWTs are about portable identity and claims. They are great for stateless auth, but they push complexity into token issuance, key rotation, expiration strategy, and revocation (which is not "free" just because JWT is stateless).
HTTPS is table stakes, but in design conversations it unlocks important details: TLS termination (where?), mTLS for service-to-service identity, certificate rotation, and performance tradeoffs (handshakes, session reuse).
If you connect these two, you get a more realistic story: who terminates TLS, who validates tokens, and where you enforce authorization (edge, gateway, service).
2) Reliability and correctness under retries
- Idempotency
- ACID vs BASE
- Health checks vs heartbeats
Idempotency is the antidote to retries. If the network or client retries, can your system safely process the request again? Payment systems, order creation, email sending, and provisioning all need an idempotency strategy (idempotency keys, dedupe tables, exactly-once illusions).
ACID vs BASE is not a debate to "pick a side." It is about choosing guarantees that match the domain. Strong transactions help when invariants matter (balances, inventory constraints). BASE approaches can unlock scale, but you must design for eventual consistency, compensation, and user experience during convergence.
Health checks vs heartbeats sounds minor until you operate distributed systems. Health checks are usually pull-based (load balancer asks "are you alive?"). Heartbeats are push-based (node announces liveness). Knowing the difference helps you reason about failure detection, false positives, and noisy restarts.
If you remember one thing: retries are normal. Correctness is optional unless you design for it.
3) Scalability at the edge and in the core
- Rate limiting
- CDN
- Load balancing algorithms
- Consistent hashing
Rate limiting is about protecting shared resources and enforcing fairness. The real design depth is choosing the unit (per IP, per user, per token), the scope (global vs per region), the algorithm (token bucket, leaky bucket, fixed window, sliding window), and where enforcement lives (gateway, service, sidecar).
CDNs reduce latency and origin load, but also add cache invalidation complexity. In interviews and real life, the key is being explicit: what is cacheable, for how long, and what the invalidation strategy is (versioned assets, purge APIs, surrogate keys).
Load balancing algorithms matter because they change tail latency. Round robin is simple. Least connections helps with uneven workloads. Weighted variants account for heterogeneous instances. Consistent hashing is the special case you reach for when you want stickiness (caches, sharded services) while minimizing remapping when nodes change.
4) Service boundaries and traffic management
- Microservices
- API gateway vs load balancer vs reverse proxy
- Service discovery
- API protocols
- gRPC
Microservices are not the goal. They are a tradeoff: independent deployability and team autonomy in exchange for distributed complexity (latency, partial failures, versioning, data ownership).
The "gateway vs load balancer vs reverse proxy" comparison is one of those topics that clears up years of confusion:
- Load balancer: distributes traffic across instances.
- Reverse proxy: sits in front of servers, often handling routing, TLS termination, caching, and headers.
- API gateway: productized reverse proxy for APIs, with auth, rate limiting, request shaping, and analytics.
Service discovery is what makes microservices dynamic. You need a registry (or DNS-based discovery), health-based routing, and a plan for zone or region awareness.
API protocols and gRPC round out the communication toolkit. REST is ubiquitous, but gRPC can shine for internal service-to-service calls with strict schemas, streaming, and better performance. The design skill is choosing what fits the constraints: public APIs often prioritize compatibility and debuggability; internal APIs often prioritize throughput and consistency.
5) Data storage, caching, and data movement
- Database types (listed twice in the post, which honestly happens when you are moving fast)
- SQL vs NoSQL
- Database caching
- Connection pooling
- Change Data Capture (CDC)
Database types and SQL vs NoSQL are really about access patterns. If you do not start from queries, you will pick the wrong database. The classic checklist: read/write ratio, query complexity, consistency needs, latency targets, and growth expectations.
Database caching is not just "add Redis." It is choosing a strategy:
- Cache-aside (application manages cache)
- Read-through / write-through
- Write-back
Then you handle hard parts: cache stampedes, hot keys, and invalidation.
Connection pooling is operational but critical. Without it, you can DDoS your own database with connection churn. With it, you have to tune pool sizes, timeouts, and queueing so you do not hide overload until everything collapses.
CDC is the bridge between systems. It powers event-driven architectures, search indexing, analytics pipelines, and cache updates by streaming database changes. The depth comes from ordering, deduplication, schema evolution, and replay.
6) Asynchrony and event-driven design
- Message queues
- Pub/sub
Queues and pub/sub are easy to name and hard to use well. A queue often implies a single consumer (or a consumer group) and work distribution. Pub/sub implies many consumers and fanout. Both force you to think about delivery guarantees (at-most-once, at-least-once), retry policies, dead-letter queues, backpressure, and idempotent consumers.
The missing connective tissue: observability and quality attributes
Nikki also included:
- Observability
- System design quality attributes
These two are the glue.
Quality attributes are the non-functional requirements that drive everything: latency, availability, durability, scalability, security, cost, and maintainability. If you can articulate which ones matter most, your design choices stop looking random.
Observability is how you validate reality: metrics, logs, and traces tied to clear SLOs. In practice, I would study observability alongside incident response basics: alert fatigue, error budgets, and the difference between "monitoring" and "debugging in production." Without that, system design stays theoretical.
A system is not "designed" until you can operate it.
What I would add to Nikki's list next
Nikki asked, "What else would you add?" Here are a few additions that naturally follow from her 25:
- CAP theorem and consistency models (linearizability, read-your-writes)
- Distributed transactions patterns (sagas, outbox pattern)
- Backpressure and load shedding
- Data partitioning and sharding strategies beyond consistent hashing
- Multi-region design (active-active vs active-passive, failover)
- Schema evolution and versioning (APIs and events)
- SLOs, SLIs, and error budgets (ties design to ops)
A simple way to use this as a 4-week study plan
If you want to turn the list into real skill:
- Week 1: Reliability core (idempotency, retries, health checks, queues)
- Week 2: Scalability (rate limiting, load balancing, caching, CDN, hashing)
- Week 3: Service architecture (microservices, discovery, gateways, protocols, gRPC)
- Week 4: Data foundations (SQL vs NoSQL, database types, CDC, pooling, ACID vs BASE)
Then do one "design a system" exercise per week and force yourself to name which concepts you used and why.
This blog post expands on a viral LinkedIn post by Nikki Siapno, Eng Manager | ex-Canva | 400k+ audience | Helping you become a great engineer and leader. View the original LinkedIn post →