System Design Cookbook · v1.0

URL Shortener
from Zero to Staff Engineer

Every concept explained with Why, What, When, Where, How, Drawbacks and Advantages. Built for someone who wants to think like a 30-year veteran, not just memorise answers.

80+

Terms Defined

Deep Chapters

40+

Interview Q&A

∞

Offline Access

How to Use This Cookbook

💡 Golden Rule: Don't memorise answers. Understand the reasoning. An interviewer can always ask a twist. If you understand WHY, you can answer any variant.

Day 1 — Read "Complete Mind Map"

Get the big picture. See how all pieces connect. Don't go deep yet. Just understand that every design choice exists for a reason.

Day 2 — Scale & Numbers + ID Generation

Memorise the numbers with their derivations. Learn why Snowflake beats MD5. This forms the backbone of every answer.

Day 3 — Caching + Database Design

The two biggest topics. Understand L1/L2/L3. Understand CQRS. Understand why Cassandra for reads and PostgreSQL for writes.

Day 4 — Consistency + Kafka

The hardest concepts. CAP, PACELC, LOCAL_QUORUM, read-your-own-writes. Kafka acks=all, exactly-once semantics.

Day 5 — Architecture + Failover

How it all connects. Write path, read path, disaster recovery, leader election, fencing tokens. This is Staff-level territory.

Day 6 — Cross Questions + Glossary

Practice every interviewer trap. Read every term definition. Know the exact one-liner for any term they throw at you.

Day 7 — Full mock interview with Cheat Sheet

Cover the cheat sheet. Talk out loud for 45 minutes on the problem. Pause at each section and verify you can answer every sub-question without looking.

How to push to GitHub Pages: Save this file as index.html. Create a GitHub repo. Go to Settings → Pages → Source: main branch, /root. Your cookbook will be live at https://yourusername.github.io/reponame within 2 minutes. Works 100% offline too — no internet needed once loaded.

Complete Mind Map

Every major concept in the URL Shortener design and how they connect. Use this as your orientation before going deep into any chapter.

🔗 URL Shortener System

ID Generation

Pre-gen Pool
Snowflake ID
Base62 encode
SKIP LOCKED
Circuit breaker

Caching

L1 Caffeine
L2 Redis
L3 CDN
LFU eviction
SETNX mutex

Database

PostgreSQL (write)
Cassandra (read)
CQRS pattern
ClickHouse (analytics)
WAL replication

Consistency

CAP theorem
PACELC
LOCAL_QUORUM
Read-your-own-writes
Eventual consistency

Failover & DR

GeoDNS / GSLB
Leader election (etcd)
Split brain prevention
Fencing token
Canary recovery

Kafka

acks=all
min.insync.replicas
Consumer groups
MirrorMaker 2
Dead letter queue

Security

JWT / OAuth2
Rate limiting
Token bucket
SSRF prevention
301 vs 302

Observability

SLO / SLA / SLI
p99 latency
Synthetic testing
Chaos engineering
Predictive alerts

The 5 Questions to Answer in Every Interview

1. What is the scale?

Always start by establishing numbers. DAU, reads/sec, writes/sec, storage. Every design decision follows from scale. A system for 1000 users is completely different from one for 100 million.

2. What fails first?

At each scale level, identify the bottleneck. DB at 1000 r/s. Cache miss at 10,000 r/s. Network at 100,000 r/s. Hot partition at 1M r/s. Design is about managing bottlenecks.

3. Availability or Consistency?

This single choice drives 80% of your architecture decisions. For URL shortener: Availability. A stale redirect is OK. A 503 error is not. This justifies Cassandra, eventual consistency, async replication.

4. What is the read:write ratio?

URL shortener is 100:1 read-heavy. This justifies: separate read DB, heavy caching, read replicas, CDN. If it were write-heavy, the entire architecture would differ.

5. What happens when X fails?

For every component you add, the interviewer WILL ask "what if that fails?". Think in failure modes first. Circuit breaker for pool. Cassandra RF=3 for node failure. Kafka replay for datacenter failure. GeoDNS for region failure. Design the happy path last.

Scale & Numbers

These numbers are not magic — each one is derived. Know the derivation, not just the result. An interviewer will ask "how did you get that?"

🏆 Expert move: Always establish numbers in the first 3 minutes. Say "Before I design anything, let me estimate scale." This immediately signals seniority.

Traffic Estimation

100MDaily Active UsersAssumed from problem statement

10M/dayNew URLs created1 per 10 users per day

115/sWrites per second10M ÷ 86,400 seconds

11,500/sReads per second115 × 100 (read:write ratio)

100:1Read : Write ratioPeople click more than they create

345/sPeak writes (3x)Always plan for 3× average traffic

// Full derivation — speak this out loud in the interview
DAU                    = 100,000,000
URL creation rate      = DAU × 0.1 = 10,000,000 / day
Seconds in a day       = 24 × 60 × 60 = 86,400
Write TPS              = 10,000,000 / 86,400 ≈ 115 writes/sec

Read:Write ratio       = 100:1  (users click far more than they create)
Read TPS               = 115 × 100 = 11,500 reads/sec

Peak traffic (3× avg)  = 345 writes/sec, 34,500 reads/sec

Storage Estimation

// Per-record breakdown
short_code    =   8  bytes   (base62, 6-8 chars)
long_url      = 200  bytes   (average URL length)
user_id       =  16  bytes   (UUID = 128-bit)
created_at    =   8  bytes   (TIMESTAMPTZ = 8 bytes in Postgres)
expires_at    =   8  bytes
is_active     =   1  byte
metadata      =  60  bytes   (geo, IP, custom alias flag, etc)
               ─────────────
Total         ≈ 300 bytes per row

5-year total rows = 10M/day × 365 × 5 = 18.25 billion rows
Raw storage       = 18.25B × 300B     = 5.5 TB
Replication (RF=3)= 5.5 × 3          = 16.5 TB
Index overhead 20%                    ≈ 20 TB total on disk

URL Length — Why 6 Characters?

What is base62?

Characters: a-z (26) + A-Z (26) + 0-9 (10) = 62 characters. All URL-safe. No encoding needed.

Why not base64?

Base64 uses + and / which are URL-special characters. They need percent-encoding in URLs. Base62 avoids this entirely.

Why 6-8 chars?

base62^6 = 56 billion combinations. base62^7 = 3.5 trillion. 18.25 billion records over 5 years fits comfortably in 7 chars with room to spare.

Why not base58?

Base58 (used by Bitcoin) removes visually confusing chars: 0, O, l, I. Valid choice if URLs will be typed by humans. Minor trade-off: slightly fewer combinations per character.

Bandwidth & Latency Targets

Write bandwidth: 115 w/s × 2 KB (request) = 230 KB/s inbound
Read bandwidth:  11,500 r/s × 500 B (302 response) = 5.7 MB/s outbound
Peak read:       34,500 r/s × 500 B = 17 MB/s outbound

Latency targets:
  p50 redirect latency: < 10ms  (CDN hit)
  p99 redirect latency: < 50ms  (cache hit)
  p99 redirect latency: < 200ms (DB hit, worst case)

Cache math:
  If L1 hit rate = 80%, L2 = 18%, L3 = 1.5%:
  DB reads = 11,500 × 0.005 = 57 reads/sec to Cassandra — trivial!

Q "Why 100:1 read:write ratio? How did you arrive at that?"

Answer: Think about real behaviour. A marketing team creates 1 short URL for a campaign. That URL gets shared on social media and clicked by 10,000 people. A developer creates a short URL for documentation — it gets clicked every time someone reads the docs. The read:write ratio reflects that URLs are created once but clicked many, many times. 100:1 is a reasonable baseline; for viral URLs it could be 1,000,000:1. This ratio justifies our entire architecture: heavy caching, read replicas, CDN, separate read DB.

ID Generation & Hashing

The core question: how do you generate a globally unique 6-8 character short URL across multiple servers in multiple continents without collisions?

⚠️ Common wrong answer: "I'll just hash the long URL with MD5 and take the first 6 characters." This fails at scale due to the birthday paradox. Know exactly WHY it fails before proposing it.

Approach 1 — MD5 Truncation (Naive, Wrong)

What

Hash the long URL with MD5, take first 6 characters of the hex output, encode as base62.

Why it fails

Birthday paradox: with 56B combinations and 18B URLs, collision probability grows non-linearly. At 50% of space used, ~50% of new URLs collide. Retry loop under heavy load becomes O(n) unbounded.

Birthday Paradox — the exact math

If you randomly pick from N items, you expect your first collision at approximately √N picks. With base62^6 = 56 billion combinations: first collision expected at √(56B) ≈ 237,000 URLs. At 10M URLs/day, you hit collisions within hours. This is catastrophic.

Approach 2 — Pre-Generated Pool (Recommended)

What

Pre-generate thousands of unique short codes and store them in a pool table. On write request: atomically pop one. Refill the pool asynchronously when it drops below threshold.

Why

Removes all collision handling from the write path. The pool contains only unique, verified codes. Pop is O(1), no retries, no race conditions within a region.

When to use

When write latency must be ultra-low and predictable. When you can afford a background job for pre-generation. Not suitable if you need truly random-looking URLs with no pattern.

How — SKIP LOCKED

SELECT ... FOR UPDATE SKIP LOCKED is the key. Multiple workers can pop from the pool concurrently — each skips rows already locked by others. No deadlocks, no waits, true parallelism.

Advantage

O(1) write path. No collisions at runtime. Predictable latency. Easy to monitor pool health. Survives burst traffic (pool absorbs the load).

Drawback

Cross-region coordination is impossible — two regions cannot share one pool without a central coordinator (defeats the purpose). Solution: regional pools with prefixes.

-- Pre-generation job (runs per region, independently)
INSERT INTO url_pool (short_code, region, taken)
SELECT base62(nextval('url_seq')), 'US-EAST', false
FROM generate_series(1, 10000);

-- Atomic pop — O(1), concurrent-safe, no deadlock
WITH popped AS (
  SELECT short_code FROM url_pool
  WHERE taken = false
    AND region = 'US-EAST'
  LIMIT 1
  FOR UPDATE SKIP LOCKED   -- ← KEY: skips locked rows instantly
)
UPDATE url_pool SET taken = true
WHERE short_code = (SELECT short_code FROM popped)
RETURNING short_code;

-- SKIP LOCKED explanation:
-- Worker A locks row "abc123" → Worker B sees it locked → skips it
-- Worker B immediately takes "def456" → no wait, no deadlock
-- 100 concurrent workers can pop simultaneously

⚡ Regional pool problem: US-East and EU cannot share one pool. If both try to pop the same code simultaneously, we get duplicates. Solution: give each region a prefix. US=1xxxx, EU=2xxxx, Asia=3xxxx. Prefixes guarantee global uniqueness. Each region manages its own pool independently.

Approach 3 — Snowflake ID (Best Fallback)

What

A 64-bit integer composed of: timestamp bits + region/node ID bits + sequence bits. No database needed. Each server generates IDs independently. Then base62-encode the integer to get the short code.

Why it works

Uniqueness is guaranteed by construction: same timestamp + same node can never produce the same sequence number. No central coordinator. No lock. Pure mathematics.

When to use

When the pool is empty (circuit breaker fallback). When you want no-dependency ID generation. When you need IDs to be monotonically increasing (good for DB B-tree index locality).

How — bit layout

41 bits for timestamp (69 years), 6 bits for region (64 regions), 6 bits for node (64 nodes per region), 11 bits for sequence (2048 IDs per millisecond per node). Total: 64 bits.

Advantage

No coordinator. No database. No collision. Sortable by time. Good B-tree locality. Survives any infrastructure failure.

Drawback

Clock skew: if a server's clock drifts backward, you can generate duplicate IDs. Mitigation: wait until clock catches up, or use NTP with tight synchronisation.

// Snowflake ID bit layout
// ┌──────────────────────┬──────────┬──────────┬───────────┐
// │     41 bits          │  6 bits  │  6 bits  │  11 bits  │
// │  timestamp (ms)      │ regionId │  nodeId  │ sequence  │
// └──────────────────────┴──────────┴──────────┴───────────┘
// 41 bits → 2^41 ms = 69 years from custom EPOCH
// 6 bits  → 64 regions max
// 6 bits  → 64 nodes per region
// 11 bits → 2048 IDs per millisecond per node

public class SnowflakeIdGenerator {
    private static final long EPOCH = 1700000000000L; // Nov 2023
    private final long regionId;
    private final long nodeId;
    private long sequence = 0;
    private long lastTimestamp = -1;

    public synchronized long nextId() {
        long ts = System.currentTimeMillis() - EPOCH;
        if (ts == lastTimestamp) {
            sequence = (sequence + 1) & 0x7FFL; // 11-bit mask
            if (sequence == 0) ts = waitNextMs(lastTimestamp);
        } else { sequence = 0; }
        lastTimestamp = ts;
        return (ts << 23) | (regionId << 17) | (nodeId << 11) | sequence;
    }

    public String shortCode() {
        return Base62.encode(nextId()); // → 7 char string
    }
}

Circuit Breaker Pattern — Pool Empty Scenario

What

A circuit breaker monitors the health of a resource (here: the URL pool). When the pool is empty or the refill job is dead, it "trips" and switches the entire system to a fallback path.

Why

Without a circuit breaker: empty pool → all writes fail → service down. With circuit breaker: empty pool → switch to Snowflake ID generation → service continues at slightly higher latency. Graceful degradation over hard failure.

3 States

CLOSED: Normal. Pool requests flow through. OPEN: Pool failed. All requests routed to Snowflake fallback. HALF-OPEN: Probe: try pool once. If succeeds, close. If fails, stay open.

Advantage

Prevents cascading failures. Automatic recovery. No human intervention needed. System survives a dead refill job at 3 AM.

Drawback

State management complexity. False positives can unnecessarily switch to fallback. Need tuning: how many failures trigger open state? How long before half-open probe?

When to use

Any time you have a primary path and a fallback path. Pool vs Snowflake. Redis vs DB. Primary region vs secondary region.

// Predictive monitoring — Staff-level signal
// Don't alert when pool = 0 (already broken)
// Alert when pool will hit 0 in 10 minutes

double consumptionRate = getConsumedPerSecond(); // e.g. 115/s
long   remaining       = getPoolRemaining();       // e.g. 50,000
double timeToEmpty     = remaining / consumptionRate; // seconds

if (timeToEmpty < 600) {  // < 10 minutes
    alertOncall("Pool empties in " + timeToEmpty + "s — refill NOW");
}
if (timeToEmpty < 120) {  // < 2 minutes — emergency
    circuitBreaker.trip();  // switch to Snowflake immediately
}

Caching — All Layers

Caching is what makes an 11,500 reads/sec system survive with only 57 database reads/sec. The three layers work together: each catches what the previous missed.

🏆 Key insight: The goal is 99%+ cache hit rate. At 11,500 r/s with 99.5% hit rate, only 57 r/s reach Cassandra. This is the entire reason caching exists — to make the database irrelevant for 99% of traffic.

L1 — Caffeine (In-Process Cache)

What

A Java in-memory cache running inside the same JVM as your application. No network call. Sub-millisecond access. Uses the W-TinyLFU algorithm for near-optimal eviction.

Why Caffeine over Guava?

Guava uses LRU (Least Recently Used). Caffeine uses W-TinyLFU — a window-based tiny least-frequently-used algorithm. W-TinyLFU achieves near-optimal hit rate for skewed (Zipf) distributions like URL access patterns. 10-30% better hit rate than pure LRU.

How — W-TinyLFU

Uses a Count-Min Sketch (probabilistic frequency counter, O(1) space) to estimate how often each key is accessed. Items admitted to main cache only if their frequency exceeds the victim being evicted. A "window" cache handles newly popular items before they have frequency history.

When to use

Any read-heavy Java application. Data that fits in heap memory (10k–500k entries). When network round-trip to Redis would dominate latency.

Advantages

Zero network overhead. Sub-microsecond access. No external dependency. Survives Redis outage. Best hit rate for skewed access patterns.

Drawbacks

Per-JVM cache — 10 instances = 10 separate caches. Cache staleness between instances (eventual consistency). Consumes JVM heap. Lost on restart. Not shared across services.

Cache<String, String> l1Cache = Caffeine.newBuilder()
    .maximumSize(10_000)                   // top 10k URLs in memory
    .expireAfterWrite(5, TimeUnit.MINUTES)  // staleness limit
    .recordStats()                         // enables hitRate() metric
    .build();

// Access — absolutely zero network call
String longUrl = l1Cache.getIfPresent(shortCode);
if (longUrl == null) {
    longUrl = l2Cache.get(shortCode);      // fall to Redis
    l1Cache.put(shortCode, longUrl);       // populate L1
}

// Count-Min Sketch — how W-TinyLFU estimates frequency
// 4 hash functions, each maps key to a counter array cell
// Frequency estimate = minimum of the 4 cells
// Space: O(1) regardless of number of distinct keys
// Error: bounded by ε with probability 1-δ

L2 — Redis Cluster (Regional Cache)

What

An in-memory key-value store running as a separate service, shared across all application instances in a region. Accessed via network (< 1ms within same datacenter).

Why Redis over Memcached?

Redis supports data structures (strings, hashes, sorted sets, counters), persistence, pub/sub, Lua scripting, and cluster mode. Memcached is simpler but has no persistence or advanced structures. Redis is the default choice for most production systems.

LFU eviction — allkeys-lfu

With maxmemory-policy allkeys-lfu, Redis uses LFU to decide what to evict. It maintains a frequency counter per key (using a probabilistic approximation — Morris counter). Keys with lowest frequency are evicted first, regardless of recency. Perfect for URL shortener — viral URLs stay cached even if not accessed for hours.

When to use

Shared cache across multiple application instances. Data too large for L1 heap. When you need persistence (RDB/AOF) or pub/sub. Cross-service caching.

Advantages

Shared across all instances (no per-instance staleness). Large capacity (limited by RAM). Rich data structures. TTL per key. Atomic operations. Persistence options.

Drawbacks

Network hop (1–5ms). Single point of failure (mitigated by Redis Sentinel or Cluster). Memory cost. Serialisation overhead. Cluster resharding complexity.

The Thundering Herd Problem — Full Deep Dive

What

A popular URL's cache entry expires. At that exact moment, 10,000 simultaneous requests arrive. All miss the cache. All 10,000 hit Cassandra simultaneously. Cassandra gets overloaded and dies. System cascades to failure.

Why it happens

Viral URLs get massive concurrent traffic. TTL expiry is a simultaneous event. Without protection, the cache miss translates directly to a DB spike proportional to traffic volume.

Solution: SETNX Mutex

SETNX = SET if Not eXists. First thread to call it "wins" and fetches from DB. All others see the lock exists and wait. When the winner populates the cache, all others read from cache. DB sees only 1 request instead of 10,000.

Result

DB sees exactly 1 request per cache miss, regardless of concurrent traffic volume. Thundering herd completely eliminated.

Risk

Lock holder crashes mid-fetch → lock never released → all waiting threads starve. Mitigation: always set a TTL on the lock (e.g. 500ms). If TTL expires without cache being populated, next thread retries.

Also called

Cache stampede, dogpile effect. Same problem, different names. Know all three terms.

// SETNX mutex — prevents thundering herd / cache stampede
public String getWithMutex(String shortCode) {
    // 1. Try cache first
    String cached = redis.get(shortCode);
    if (cached != null) return cached;

    String lockKey = "lock:" + shortCode;
    String lockVal = UUID.randomUUID().toString(); // unique owner ID

    // 2. Try to acquire lock (NX=only if not exists, PX=expire in 500ms)
    Boolean acquired = redis.set(lockKey, lockVal,
        SetParams.setParams().nx().px(500));

    if (acquired != null) {
        // 3. WE got the lock — fetch from DB
        try {
            String longUrl = cassandra.get(shortCode);
            redis.setex(shortCode, 86400, longUrl); // cache 24h
            return longUrl;
        } finally {
            // 4. Release lock ONLY if we still own it (atomic Lua script)
            redis.eval("if redis.call('get',KEYS[1])==ARGV[1] " +
                       "then return redis.call('del',KEYS[1]) " +
                       "else return 0 end",
                List.of(lockKey), List.of(lockVal));
        }
    } else {
        // 5. SOMEONE ELSE has lock — wait and retry
        Thread.sleep(50);
        return redis.get(shortCode); // should be populated now
    }
}

L3 — CDN Edge Cache

What

A distributed network of edge servers globally (CloudFront, Fastly, Cloudflare). Caches the redirect response (302 + Location header) at the edge closest to the user. Tokyo user gets served from Tokyo edge, not from your US-East datacenter.

Why

Eliminates intercontinental latency for hot URLs. A cache hit at CDN = ~5ms instead of ~200ms (Asia→US round trip). For viral URLs with millions of clicks, CDN handles 99% of load without touching your infrastructure.

Critical drawback

CDN cache is hard to invalidate instantly. Purging a URL from all global edge nodes takes 1–5 minutes. If you delete or update a URL, users may get the old redirect for minutes. This is why CDN TTL should be short (1 hour max) for mutable URLs.

Invalidation

Call CDN invalidation API (e.g. CloudFront CreateInvalidation) when a URL is updated or deleted. Propagates globally in 1–5 minutes. For instant invalidation, use short TTL instead of explicit purge.

The Three Cache Failure Modes

Cache Penetration

What: Requests for short codes that don't exist in the system bypass cache every time (cache returns null → hits DB → DB returns null → nothing to cache → infinite DB hits).

Fix: Cache negative results ("NULL" with short TTL like 60s). Or use a Bloom filter at the API gateway: if the Bloom filter says "definitely not exists", return 404 immediately without touching cache or DB.

Bloom filter guarantee: Zero false negatives (if item is in the filter, it's definitely in the system). ~1% false positives (might say item exists when it doesn't — harmless, just a cache miss).

Cache Avalanche

What: Many cache entries expire at the same time → mass DB requests → DB overwhelmed → system down.

Cause: All entries written at the same time with the same TTL (e.g., after a service restart or cold start).

Fix: Add random jitter to TTL: TTL = baseTTL + random(0, baseTTL × 0.2). This spreads expiry events over time instead of bunching them.

Cache Pollution

What: A scraper or bot accesses millions of unique, low-popularity URLs once each. These fill the cache, evicting frequently-accessed popular URLs. Cache hit rate drops catastrophically.

Fix: LFU eviction (Caffeine W-TinyLFU, Redis allkeys-lfu). Items with frequency=1 (accessed once) are evicted before items with frequency=1000. Scraper traffic cannot pollute the cache with LFU.

Q "Redis goes down at 3 AM. What happens to your system?"

Key insight: Redis must never be a hard dependency. Design the system so Redis failure causes degradation, not failure.

L1 Caffeine (in-process) still serves 80% of reads — zero Redis involvement. The 20% that miss L1 fall through directly to Cassandra. At 11,500 r/s with 80% L1 hit rate, only 2,300 r/s reach Cassandra — well within capacity. Circuit breaker on Redis connection pool disables the L2 path and routes directly to DB. p99 latency increases from 20ms to 50ms. Service degrades gracefully; it does not fail. This is why L1 in-process cache is critical — it's your shield when external dependencies die.

Database Design

Why two databases? Because read and write operations have fundamentally different requirements that cannot be optimally satisfied by a single data store at this scale.

CQRS — Command Query Responsibility Segregation: Separate your write model (commands that change state) from your read model (queries that return data). Each store is independently optimised, scaled, and tuned. This is not over-engineering at 100M DAU — it's necessary.

PostgreSQL — Write Database

What

A relational database with full ACID guarantees, used exclusively for the write path. All URL creation, update, and deletion goes here. Never touched by the read path.

Why PostgreSQL

ACID transactions ensure no duplicate short codes. UNIQUE constraint is a hard database-level guarantee. WAL (Write-Ahead Log) enables reliable async replication to Cassandra and other replicas. Rich SQL for complex write-side queries (user dashboards, bulk operations).

ACID explained

Atomicity: Write succeeds entirely or fails entirely. No partial writes. Consistency: UNIQUE constraint enforced at DB level. Isolation: Concurrent writes don't interfere. Durability: Committed writes survive crashes (WAL + fsync).

Drawbacks

Doesn't scale horizontally for reads (hence Cassandra for reads). Single-region primary means intercontinental writes have higher latency. Complex sharding if writes exceed single-machine capacity.

WAL — Write-Ahead Log

Every data change in PostgreSQL is written to the WAL (a sequential log file) BEFORE being applied to the actual data files. This enables: crash recovery (replay WAL on restart), replication (stream WAL to replicas), and point-in-time recovery (replay WAL to any past moment). Logical replication streams WAL to Cassandra and other consumers.

-- PostgreSQL schema — write-optimised
CREATE TABLE url_mappings (
    id           BIGINT         PRIMARY KEY,         -- Snowflake ID
    short_code   VARCHAR(8)    NOT NULL UNIQUE,     -- UNIQUE = DB-level guarantee
    long_url     TEXT           NOT NULL,
    user_id      UUID,                               -- NULL = anonymous
    created_at   TIMESTAMPTZ    DEFAULT NOW(),
    expires_at   TIMESTAMPTZ,                        -- NULL = no expiry
    is_active    BOOLEAN        DEFAULT TRUE,
    custom_alias BOOLEAN        DEFAULT FALSE
);

-- Indexes — each one has a specific purpose
CREATE UNIQUE INDEX idx_short_code
    ON url_mappings(short_code);              -- fast lookup by short code

CREATE INDEX idx_user_id
    ON url_mappings(user_id);                 -- user's URL list

CREATE INDEX idx_expires_active
    ON url_mappings(expires_at)
    WHERE expires_at IS NOT NULL;           -- PARTIAL INDEX: only non-null rows
-- Partial index is smaller and faster than full index
-- Only indexes the ~10% of rows that have an expiry date

Cassandra — Read Database

What

A distributed NoSQL database optimised for high-throughput key-value reads. Multi-region active-active. No single point of failure. Scales horizontally by adding nodes.

Why Cassandra for reads

Our read pattern is simple: given short_code, return long_url. Cassandra is a masterclass at exactly this — single-key lookups at massive scale. It distributes data across nodes using consistent hashing, so any node can serve any key. Adding more nodes linearly increases throughput.

Partition key

In Cassandra, the partition key determines which node stores the data. Our partition key = short_code. High cardinality (millions of unique codes) means data is spread evenly across all nodes. No hot partitions.

Consistent hashing

Cassandra maps each short_code to a token on a ring. Each node owns a range of tokens. When a node is added or removed, only the adjacent tokens are remapped — not all data. This is why Cassandra scales without downtime.

Advantages

Active-active multi-region. No primary node (any node can serve reads). Linear scalability. Tunable consistency (ONE, LOCAL_QUORUM, QUORUM). Built-in replication factor. No joins = no lock contention.

Drawbacks

No ACID. No joins. No secondary indexes at scale. Schema must be designed for access patterns (not normalisation). Compaction can cause latency spikes. Repair jobs required for consistency.

-- Cassandra keyspace with multi-region replication
CREATE KEYSPACE url_shortener
WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'us_east': 3,   -- 3 replicas in US-East
    'eu_west': 3,   -- 3 replicas in EU-West
    'asia_pac': 3   -- 3 replicas in Asia-Pacific
};

-- Primary lookup table
CREATE TABLE url_mappings (
    short_code   TEXT,
    long_url     TEXT,
    created_at   TIMESTAMP,
    expires_at   TIMESTAMP,
    is_active    BOOLEAN,
    PRIMARY KEY (short_code)   -- short_code IS the partition key
) WITH compaction = {'class': 'LeveledCompactionStrategy'};

-- LeveledCompactionStrategy (LCS) vs SizeTieredCompactionStrategy (STCS)
-- STCS: better for write-heavy. Large SSTable merges. Higher read amplification.
-- LCS:  better for read-heavy. Maintains sorted levels. Lower read amplification.
-- For URL shortener (100:1 read:write) → LCS is correct choice.

Hot Partition Problem — Deep Dive

What happens

A viral URL (e.g., World Cup score link) gets 10 million clicks in 60 seconds. All clicks → same Cassandra partition key → same node → node CPU at 100% → reads slow → eventually node dies → other nodes can't replicate fast enough → cascade failure.

Solution: Deterministic bucketing

Add a bucket_id to the partition key. Bucket = hash(short_code) % N. Since the hash is deterministic, both read and write always go to the same bucket — no scatter-gather needed. Spreads load across N partitions.

Wrong solution

Random bucket on write, then scatter-gather on read (query all N buckets). This turns 1 read into N reads. At 10M r/min with N=100, you get 1 billion Cassandra queries per minute. Worse than the original problem.

Why it works

Consistent hashing + deterministic bucket = no coordination needed. Write knows exactly which bucket. Read knows exactly which bucket. O(1) lookup, load spread across N nodes.

ClickHouse — Analytics Database

What

A columnar OLAP (Online Analytical Processing) database designed for high-speed aggregations over large datasets. Stores data column-by-column instead of row-by-row.

Why columnar for analytics

Query: "total clicks on URL X per hour for the last 30 days." This reads only the click_count and click_hour columns — ignoring all other columns. Row-based DBs read entire rows even for single-column aggregations. Columnar = 10-100x faster for analytical queries.

Position in architecture

Never in the write path. Kafka consumer reads url.clicked events → aggregates → bulk inserts into ClickHouse. Decoupled from user-facing latency. Analytics can be delayed by seconds or minutes — that's acceptable.

When BigQuery instead

BigQuery (Google's managed columnar DB) when you want zero infrastructure management. ClickHouse when you want self-hosted with more control and lower cost at scale. Both are columnar, append-only, eventual consistency.

Database	CAP	ACID	Scale pattern	Best for	URL Shortener role
PostgreSQL	CP	Yes	Vertical + read replicas	Writes, transactions, complex queries	Write primary
Cassandra	AP	No	Horizontal (add nodes)	High-throughput key-value reads, multi-region	Read store
Redis	AP	No	Cluster sharding	Caching, rate limiting, pub/sub	L2 Cache
ClickHouse	AP	No	Horizontal sharding	Analytics, columnar aggregations	Analytics store
DynamoDB	AP/CP	Partial	Managed horizontal	Serverless key-value, managed ops	Alternative to Cassandra
CockroachDB	CP	Yes	Horizontal (Raft)	Geo-distributed ACID SQL	Alternative if global consistency needed

Consistency Models

The hardest topic in distributed systems. Most candidates know CAP. Staff-level candidates know PACELC, LOCAL_QUORUM, read-your-own-writes, and how to solve each with concrete mechanisms.

CAP Theorem — What It Actually Means

What

Eric Brewer's theorem (2000): In the presence of a network Partition, a distributed system must choose between Consistency and Availability. You cannot guarantee both.

Why it matters

Network partitions are not theoretical — they happen in production regularly (switch failure, network congestion, datacenter isolation). When they happen, your design choice determines whether users see stale data or no data at all.

URL Shortener choice

AP — Availability over Consistency. A stale redirect (301→old URL) is a minor UX issue. A 503 Service Unavailable during a partition is catastrophic. We choose to serve potentially stale data rather than refuse requests.

Common misconception

"CA systems" (consistent AND available, no partition tolerance) only exist as single-node databases. Any networked distributed system MUST tolerate partitions — the real choice is always C vs A during a partition.

System	CAP Choice	During Partition Behaviour
Cassandra	AP	Returns potentially stale data. Continues to accept writes.
etcd / ZooKeeper	CP	Refuses reads/writes if quorum lost. Safety over availability.
DynamoDB	AP (tunable)	Eventually consistent by default. Strong consistency optional.
Spanner	CP (TrueTime)	Globally consistent using atomic clocks. Accepts higher latency.
PostgreSQL (single)	CA*	*Only works as single node — no real partition tolerance.

PACELC — The Real Model

PACELC extends CAP: "if Partition → choose Availability vs Consistency; Else (normal) → choose Latency vs Consistency." CAP only covers the partition scenario. PACELC covers the normal operation trade-off too. This is the model production engineers actually use.

URL Shortener PACELC position:
  if Partition  → choose Availability  (serve stale data, don't refuse)
  else          → choose Latency       (ONE consistency level, fast reads)

Cassandra is PA/EL: Available during partition, Low-latency normally.
Spanner is PC/EC:   Consistent during partition, Consistent (higher latency) normally.

Cassandra Consistency Levels — Complete Reference

Level	Reads from	Writes to	Latency	When to use
`ONE`	1 replica	1 replica	Lowest	Fast reads, stale OK. URL redirect reads.
`LOCAL_ONE`	1 local DC replica	1 local DC	Lowest (no cross-DC)	Regional reads only
`QUORUM`	Majority of ALL replicas	Majority global	High (cross-DC)	Strong consistency globally
`LOCAL_QUORUM`	Majority in local DC	Majority local DC	Medium (no cross-DC)	Consistent within region. URL writes.
`ALL`	Every replica	Every replica	Highest	Maximum consistency. Fragile — one node down = failure.
`EACH_QUORUM`	Quorum per DC	Quorum per DC	Highest	Global quorum. Very expensive.

// Quorum formula — when do you get strong consistency?
// R + W > N (RF = replication factor)
// RF=3: QUORUM reads (R=2) + QUORUM writes (W=2) → 2+2=4 > 3 ✓
// RF=3: ONE reads (R=1) + ONE writes (W=1) → 1+1=2 ≤ 3 ✗ eventual

// Our choices:
// Write to Cassandra: LOCAL_QUORUM (consistent within DC, no cross-DC latency)
// Read from Cassandra: ONE (fastest, stale OK for redirects)

Read-Your-Own-Writes — The Hardest Problem

The exact problem

Tokyo user creates a short URL → written to US-East primary PostgreSQL → async replicated to Asia Cassandra (200ms lag). 50ms later: same user clicks the URL → Asia Cassandra → not replicated yet → returns 404. User sees their own creation fail immediately. Terrible UX.

Solution 1 — Session affinity (best practical)

After a write, return a token: X-Write-Region: us-east, X-Write-Ts: 1234567890. Client sends this header on next request. Gateway sees it and routes reads to US-East for 5 seconds. After 5s, Asia has the data and normal routing resumes.

Solution 2 — Primary fallback on 404

If Asia Cassandra returns null: retry the read against US-East PostgreSQL (source of truth). Cache the result locally in Asia Cassandra (async repair). User gets their URL. Slight tail latency increase but correct result.

Solution 3 — Global strong consistency

Use Spanner or CockroachDB with globally synchronous writes. Solves problem perfectly. But: 200ms write latency (US-to-Asia synchronous), 10× cost, operational complexity. Only justified if business requirement explicitly demands it.

Replication Modes — Sync vs Async

⚠️ Never do synchronous intercontinental replication for writes. US→Asia round trip = ~200ms. Waiting for ACK from all 3 DCs = 200ms per write. At 115 writes/sec, each write stacks. Under any variance, this cascades into timeouts. Use async replication with bounded RPO via Kafka instead.

Mode	Write latency	Data loss risk	Use case
Synchronous (all DCs)	200-400ms	Zero	Financial transactions (Spanner)
Semi-synchronous (1 DC)	5-20ms local	Low (1 DC loss max)	MySQL semi-sync, high-value writes
LOCAL_QUORUM	5-10ms	Cross-DC lag only	Our choice — fast and safe within DC
Asynchronous	1-5ms	RPO = replication lag	Analytics, non-critical cross-DC sync

Kafka & Event Streaming

Kafka is not just a message queue. It is a durable, replayable, distributed commit log. This distinction is what makes it the backbone of both analytics AND disaster recovery.

🏆 Staff-level insight: The single most important reason to use Kafka in this system is not analytics — it is disaster recovery replay. Without Kafka, if US-East dies, all in-flight writes are lost permanently. With Kafka, every write event is durably stored and replayable, giving you bounded RPO (Recovery Point Objective).

Kafka vs Traditional Message Queue

Feature	Kafka	RabbitMQ / SQS	Why it matters
Message retention	Days/weeks (configurable)	Until consumed	Kafka allows replay. Queue does not.
Multiple consumers	Yes — consumer groups, independently	Competing consumers only	Kafka fans out to analytics, Cassandra, ML simultaneously
Replay	Seek to any past offset	Impossible	Replay = disaster recovery, debugging, backfill
Throughput	Millions/sec per partition	Thousands/sec	URL click volume can spike to millions
Ordering	Per-partition ordering	Per-queue (usually)	All clicks for one URL ordered = correct analytics

Key Kafka Configuration — Why Each Setting Matters

acks=all

Producer waits for ALL in-sync replicas (ISR) to acknowledge the message before considering it sent. Maximum durability. If the leader dies after acks=all, at least one replica has the message. No data loss.

min.insync.replicas=2

Minimum number of replicas that must be in-sync for a produce request to succeed. With RF=3 and min.insync.replicas=2: if 2 replicas die, writes fail (rather than risking data loss). This is the safety floor.

enable.idempotence=true

Makes the producer exactly-once at the broker level. Each message gets a sequence number. If a retry delivers a duplicate (network timeout after send), broker deduplicates using the sequence number. Enables exactly-once semantics.

Consumer offset

Kafka remembers where each consumer group left off (the offset). If a consumer crashes and restarts, it picks up from where it stopped. No message loss, no re-processing (with idempotent consumers). Offset committed to __consumer_offsets internal topic.

At-least-once vs Exactly-once

At-least-once: messages delivered one or more times. Consumer must be idempotent (handle duplicates). Simpler to implement. Exactly-once: requires idempotent producer + transactional consumer. Harder but no duplicates. For analytics, at-least-once + idempotent aggregation is fine.

Dead Letter Queue (DLQ)

When a consumer fails to process a message after max retries (e.g., malformed event), it sends the message to a DLQ topic instead of blocking. The main consumer continues. DLQ messages are inspected manually or by a separate consumer. Never let one bad message block the entire consumer.

// Producer configuration — maximum durability
Properties props = new Properties();
props.put("acks", "all");                 // wait for all ISR replicas
props.put("min.insync.replicas", "2");      // at least 2 replicas in sync
props.put("enable.idempotence", "true");   // exactly-once producer
props.put("retries", Integer.MAX_VALUE);   // retry forever
props.put("max.in.flight.requests.per.connection", "5"); // ordering

// Topic design
// url.created   — partition key = short_code (ordering per URL)
// url.clicked   — partition key = short_code (all clicks ordered per URL)
// url.expired   — partition key = short_code
// url.clicked.dlq — dead letter queue for failed consumers

// Consumer groups — each independently consumes the same events
// analytics-consumer:  url.clicked → aggregates → ClickHouse
// cassandra-updater:   url.created → writes to Cassandra read store
// expiry-processor:    url.expired → marks inactive in PostgreSQL
// ml-pipeline:         url.clicked → trains recommendation model

MirrorMaker 2 — Cross-Region Replication

What

Kafka's built-in cross-cluster replication tool. Mirrors topics from source cluster (US-East) to target clusters (EU, Asia). Both analytics and DR use this.

Why it enables DR

If US-East Kafka cluster dies, EU Kafka cluster has a mirror of every event up to the moment of failure. When US-East recovers, it can replay from the EU mirror. This bounds RPO to the MirrorMaker replication lag — typically under 1 second.

Limitation

Offset translation: the same message has different offsets in source and mirror clusters. MirrorMaker 2 provides offset translation APIs, but consumers must use them correctly when switching clusters during failover.

Alternative: Confluent Replicator

Confluent's commercial cross-cluster replication. More features, better monitoring, easier offset management. Worth considering for production if budget allows.

Full Architecture

How all components connect. Three regions, two paths (read and write), one global DNS layer, and a durable event backbone.

Component Overview

Component	Technology	Purpose	Failure behaviour
GeoDNS	Route53 / GSLB	Route users to nearest healthy region	Remove unhealthy region in 30-60s
API Gateway	Kong / Envoy / AWS APIGW	Rate limiting, auth, routing, SSL termination	Redundant instances; LB in front
Identity Provider	Keycloak / Auth0 / Cognito	JWT issuance and validation	Gateway caches public key; stateless validation
Write Service	Java Spring Boot	Pop pool, write PostgreSQL, publish Kafka	Stateless; restart in < 10s
Read Service	Java Spring Boot + Caffeine	L1→L2→L3→Cassandra lookup, 302 redirect	Stateless; L1 continues without L2
URL Pool	PostgreSQL table	Pre-generated short codes	Circuit breaker → Snowflake fallback
PostgreSQL	RDS PostgreSQL / self-hosted	Source of truth for writes	Read replicas serve reads; primary auto-failover (RDS)
Redis Cluster	Redis 7+ Cluster mode	L2 cache, rate limiting, mutex	L1 absorbs 80% of reads; DB serves rest
Cassandra	Cassandra 4.x	Read-optimised URL store	RF=3; ONE consistency; 2 nodes can die
Kafka	Confluent / MSK / self-hosted	Event log for analytics and DR	RF=3; min.insync=2; MirrorMaker cross-region
ClickHouse	ClickHouse / BigQuery	Analytics queries and dashboard	Async; analytics delay acceptable
etcd	etcd 3.x	Leader election, distributed locks	Raft consensus; 3 nodes; 1 can die

API Design

// Write API — create short URL
POST /api/v1/urls
Authorization: Bearer {jwt}
Content-Type: application/json
{
  "long_url": "https://example.com/very/long/path?query=value",
  "custom_alias": "my-brand",     // optional
  "expires_in": 86400             // optional: seconds
}
→ 201 Created
{
  "short_code": "abc123",
  "short_url": "https://short.ly/abc123",
  "long_url": "https://example.com/...",
  "expires_at": "2026-05-25T00:00:00Z"
}
Headers: X-Write-Region: us-east, X-Write-Ts: 1747900000

// Read API — redirect
GET /abc123
→ 302 Found
Location: https://example.com/very/long/path?query=value
X-Served-By: cache-l1         // or cache-l2, cache-l3, db

// Bulk API
POST /api/v1/urls/bulk
[{"long_url": "...", "custom_alias": "..."}, ...]  // max 1000
→ 202 Accepted  (async processing)
{"batch_id": "batch-uuid-123"}

GET /api/v1/urls/bulk/{batch_id}
→ 200 OK  {"status": "completed", "results": [...]}

Write Path Deep Dive

Every step a URL creation request takes, with the exact decision and failure mode at each step.

User→ GeoDNS→ API Gateway→ JWT Validation→ Rate Limiter→ Write Service→ URL Validator→ Pool Pop→ PostgreSQL→ Kafka Publish→ Response

Step 1: GeoDNS Routes to Nearest Region

Based on client IP geolocation. Tokyo user → Asia-Pacific region. No code change. Purely infrastructure. Latency: < 1ms (DNS is cached).

Step 2: API Gateway — Rate Limiting + Auth

Token bucket per user_id in Redis. JWT verified using cached public key (RS256) — no call to Auth service. If JWT invalid → 401. If rate limit exceeded → 429 with Retry-After header. Latency: 1-3ms.

Step 3: URL Validation

Validate long_url format (regex). Check against Safe Browsing API (async, cached 1h). Block internal IP ranges (SSRF prevention: 10.0.0.0/8, 172.16.0.0/12, 127.0.0.0/8). Length check ≤ 2048 chars. Latency: < 5ms (cached blocklist).

Step 4: Pool Pop (O(1), SKIP LOCKED)

SELECT short_code FROM url_pool WHERE taken=false AND region='us-east' LIMIT 1 FOR UPDATE SKIP LOCKED. If pool < 20%: emit async refill event. If pool empty: circuit breaker trips → generate Snowflake ID. Latency: 2-5ms.

Step 5: Write to PostgreSQL

INSERT into url_mappings with short_code, long_url, user_id, metadata. UNIQUE constraint catches custom alias conflicts → 409 Conflict. Consistency: LOCAL_QUORUM (if using Cassandra directly for writes, else PostgreSQL primary). Latency: 5-10ms.

Step 6: Publish to Kafka

Publish url.created event with acks=all, min.insync.replicas=2, enable.idempotence=true. Event contains: short_code, long_url, user_id, timestamp, region. Kafka consumers update Cassandra, analytics. Latency: 1-3ms (async, non-blocking to user).

Step 7: Return Response

Return 201 Created with short_url, short_code, expires_at. Include X-Write-Region and X-Write-Ts headers for read-your-own-writes session affinity. Total write latency: ~15-25ms from user perspective.

⚡ Kafka publish is async but critical: We publish to Kafka BEFORE returning the response (it's fast — < 3ms with acks=all). This ensures the event is durably stored before we tell the user "success." If Kafka is down, do we fail the write? Design decision: for URL shortener, yes — Kafka durability is core to our DR story. The URL was created in PostgreSQL; Kafka failure means Cassandra won't be updated. Acceptable trade-off: brief Kafka downtime causes read-your-own-writes failures but not data loss.

Read Path Deep Dive

Every step a click takes. This path must be under 50ms for p99. The entire caching architecture exists to make this fast.

User clicks→ GeoDNS→ CDN Edge (L3)→ miss → Read Service→ L1 Caffeine→ miss → L2 Redis→ miss → Cassandra→ 302 Redirect

L3 CDN Hit — ~5ms, 1.5% of traffic

CDN edge server serves cached 302 redirect directly. No backend involved. Fastest possible path. CDN caches the HTTP response (302 + Location header), not just data. TTL: 1 hour for most URLs, shorter for custom/business URLs.

L1 Caffeine Hit — ~0.1ms, 80% of remaining

In-process JVM cache. Zero network call. Literally a HashMap lookup with LFU bookkeeping. The top 10,000 most-accessed URLs stay here permanently (LFU prevents eviction). Viral URLs never leave L1.

L2 Redis Hit — ~1-3ms, 18% of remaining

Regional Redis cluster. Network call within datacenter (~0.5ms). LFU eviction keeps hot URLs. TTL 24h. On hit: also backfill L1 so next request is even faster. SETNX mutex protects against thundering herd.

Cassandra Read — ~5-20ms, 0.5% of traffic

Consistency level: ONE (fastest, stale OK). If null returned AND we detect user just created this URL (X-Write-Ts header within 5s): retry against PostgreSQL primary (read-your-own-writes fix). On hit: populate L2 Redis and L1 Caffeine. LeveledCompactionStrategy minimises read amplification.

Not Found — 404 or 410

short_code not in system → 404 Not Found. short_code exists but is_active=false or expired → 410 Gone (semantically different — tells browser this is permanent). Cache negative result with short TTL (60s) to prevent penetration.

// Read service — full lookup chain
public String resolveLongUrl(String shortCode, String writeRegion, Long writeTs) {
    // L1: In-process cache
    String url = l1Cache.getIfPresent(shortCode);
    if (url != null) return url;

    // L2: Redis
    url = redis.get(shortCode);
    if (url != null) { l1Cache.put(shortCode, url); return url; }

    // L3: CDN handled at infrastructure level, not here

    // DB: Cassandra with ONE consistency
    url = cassandra.get(shortCode, ConsistencyLevel.ONE);

    if (url == null) {
        // Check read-your-own-writes: did this user just create it?
        if (writeRegion != null && isRecent(writeTs)) {
            url = postgresql.get(shortCode);  // fallback to source of truth
        }
    }

    if (url != null) {
        redis.setex(shortCode, 86400, url);
        l1Cache.put(shortCode, url);
        return url;
    }
    return null; // → 404
}

Failover & Disaster Recovery

The section that separates Senior from Staff. Designing the happy path is Senior. Designing how the system survives at 3 AM when a datacenter dies is Staff.

🏆 Key mindset shift: Don't design for availability. Design for controlled degradation. The question is never "will it fail?" — it will. The question is "when it fails, does it fail gracefully or catastrophically?"

RTO & RPO — The Two DR Metrics

RPO (Recovery Point Objective)

Maximum amount of data loss acceptable. "How far back in time can we afford to roll back?" Our target: RPO ≤ 1 second, bounded by Kafka replication lag. Without Kafka: RPO = undefined (could be minutes of lost writes).

RTO (Recovery Time Objective)

Maximum acceptable downtime. "How fast must we recover?" Our target: RTO ≤ 60 seconds for automatic failover (GeoDNS TTL + health check time). Full recovery (US-East back online): 30-60 minutes for safe canary ramp.

How Kafka enables bounded RPO

Every write event is published to Kafka BEFORE we return success to the user (acks=all). MirrorMaker 2 replicates events to EU and Asia. When US-East recovers, it replays Kafka from last committed offset. RPO = Kafka replication lag at time of failure (typically < 1 second).

Without Kafka — the problem

PostgreSQL async replication to EU might have 500ms lag. US-East dies. Last 500ms of writes are gone. No log to replay. Unrecoverable. This is why Kafka is not just analytics infrastructure — it is your DR backbone.

GeoDNS Failover — Exact Mechanism

// Route53 health check configuration
HealthCheckConfig:
  Type: HTTPS
  ResourcePath: /health
  FailureThreshold: 3        // 3 consecutive failures before marking unhealthy
  RequestInterval: 10        // check every 10 seconds
  // → declares unhealthy after 3 × 10 = 30 seconds

DNS TTL: 60 seconds          // clients respect this TTL
// Total failover time: 30s (health check) + 60s (TTL propagation) = ~90s

// Anycast alternative (faster failover):
// Same IP announced from all regions via BGP
// BGP withdrawal takes ~30s to propagate
// No DNS TTL dependency — faster than GeoDNS
// Used by Cloudflare, Fastly

Leader Election — Preventing Split Brain

Split brain — the catastrophe

US-East dies. EU detects this and promotes itself to write primary. US-East recovers (maybe it was a network blip). Now BOTH think they are primary. Both accept writes. Data diverges. You cannot automatically merge divergent writes. This is the worst failure mode in distributed systems.

Prevention — etcd lease

Write service holds a lease in etcd with 30s TTL. Must renew every 10s. If US-East dies, lease expires after 30s. EU watches for lease expiry, races to acquire it. Only one region can hold the lease. The lease IS the primary writer token.

Fencing token — zombie prevention

Every write includes the etcd lease version (a monotonically increasing number). Storage layer (PostgreSQL, Cassandra) rejects writes with a version number lower than the highest seen. US-East comes back zombie, tries to write with old version → rejected → cannot corrupt data.

Why etcd and not ZooKeeper?

etcd uses Raft consensus (simpler, better understood). ZooKeeper uses ZAB protocol. Both work. etcd is lighter, has a cleaner API, and is the Kubernetes default — most cloud infrastructure teams prefer it now. ZooKeeper has more history and battle-testing. Either is valid in an interview.

// Leader election with etcd — simplified
while (true) {
    // Try to acquire write primary role
    LeaseGrantResponse lease = etcd.leaseClient().grant(30).get(); // 30s TTL
    PutResponse put = etcd.kvClient().put(
        ByteString.of("/primary-writer"),
        ByteString.of("us-east"),
        PutOption.newBuilder().withLeaseId(lease.getID()).build()
    ).get();

    if (put.getPrevKv() == null) {
        // We got it! No previous value → we are primary
        startHeartbeat(lease.getID()); // renew every 10s
        break;
    } else {
        // Someone else is primary — watch for key deletion
        watchForExpiry("/primary-writer");
    }
}

// Fencing token — every write includes lease version
// Storage layer: if write.version < max_seen_version → REJECT
// This kills zombie primaries that come back after being presumed dead

US-East Dies — Full Runbook

t=0: US-East dies completely

Network partition or datacenter failure. All health checks start failing. Kafka producers in US-East stop publishing. Last Kafka offset is recorded by MirrorMaker 2 consumers in EU and Asia.

t=30s: GeoDNS removes US-East

Route53 health check declares US-East unhealthy after 3 consecutive failures. DNS records updated to remove US-East endpoints. New DNS responses point only to EU and Asia.

t=60-90s: Traffic fully shifted

DNS TTL expires. All new client connections go to EU or Asia. EU and Asia Cassandra still serves reads (AP, RF=3). etcd lease for /primary-writer expires. EU races to acquire it. EU becomes write primary.

t=90s: System operational again

EU accepts writes as primary. New URL pools generated with EU prefix (2xxxx). Kafka MirrorMaker 2 already replicating EU events. System degraded (higher latency for non-EU users) but operational.

t=30-60min: US-East recovery

1) Start in MAINTENANCE mode (no traffic). 2) Replay Kafka events from last committed offset. 3) Verify Cassandra row counts match EU. 4) Verify replication lag = 0. 5) Re-enable with 1% canary traffic. 6) Monitor error rate and p99. 7) Promote 1%→5%→25%→100%.

⚡ Canary promotion must be automated with SLO gates: Error rate < 0.01% AND p99 latency within 20% of baseline AND Kafka replication lag = 0ms. These three conditions must ALL be true before auto-promoting to the next tier. Human approval required for the final 100% promotion. This prevents re-introducing a broken node at full blast.

Security

Security is often glossed over in system design interviews. Knowing it in detail signals production experience.

Authentication — JWT & OAuth2

JWT — What

JSON Web Token. Three Base64-encoded parts: Header (algorithm), Payload (claims: userId, email, exp, iat), Signature (HMAC or RSA). The signature proves the token was issued by the IdP and hasn't been tampered with.

Why stateless

API Gateway validates JWT by verifying the signature using the IdP's public key (RS256 = RSA). No call to the Auth service per request. The public key is cached at the gateway. Massive throughput: validation is a CPU operation (< 1ms), not a network call.

RS256 vs HS256

HS256 uses a shared secret — any party with the secret can forge tokens. RS256 uses asymmetric keys — IdP signs with private key, everyone else verifies with public key. Only the IdP can issue tokens. RS256 is correct for multi-service architectures.

Token revocation problem

JWTs are stateless — you cannot "un-issue" one. If a user is banned, their JWT is still valid until expiry. Solutions: short expiry (15 min) + refresh tokens, or maintain a revocation list (sacrifices statelessness), or use opaque tokens with introspection (back to stateful).

Rate Limiting — Token Bucket Deep Dive

What

Limits requests per user/IP per time window. Prevents abuse, DoS, and resource exhaustion. Different limits per tier (free vs pro vs enterprise).

Token Bucket vs Leaky Bucket

Token bucket: bucket fills at constant rate. Each request consumes a token. Allows burst (up to bucket capacity). Leaky bucket: requests processed at fixed rate regardless of when they arrive. No burst allowed. Token bucket is more user-friendly.

Sliding window log

Exact algorithm: store timestamp of every request in a sorted set (Redis ZADD). On each request: remove entries older than 1 minute (ZREMRANGEBYSCORE), count remaining (ZCARD). If count ≥ limit → reject. Exact but high memory: O(requests) per user.

Fixed window problem

If limit is 100/minute and window resets at :00, a user can send 100 at :59 and 100 at :01 — 200 requests in 2 seconds. The boundary allows 2× the limit. Sliding window solves this.

301 vs 302 Redirect — Full Analysis

302 Found — Our Choice

Browser always calls our server on every click
We capture every click for analytics
We can update or expire the URL at any time
Rate limiting works on every request
We detect malicious usage patterns

301 Moved Permanently — Wrong Choice

Browser caches redirect permanently
Subsequent clicks never reach our server
Analytics broken — we see each URL clicked once
Cannot update long_url after creation
Cannot expire/deactivate URLs for cached clients

307 vs 302: 302 may convert POST to GET when following redirect. 307 preserves the HTTP method. For URL shortener, users are redirecting from a GET click, so 302 and 307 behave identically. Know the difference for completeness.

SSRF Prevention

What is SSRF

Server-Side Request Forgery: attacker creates a short URL pointing to an internal service (e.g., http://169.254.169.254/metadata — AWS instance metadata). When the server "validates" the URL by fetching it, it inadvertently exposes internal infrastructure.

Prevention

Before storing a URL: resolve the hostname to IP. Check the IP against blocked ranges. Block: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16 (metadata), 127.0.0.0/8 (localhost). Also: allowlist schemes (https only, block file://, ftp://).

Observability & SLOs

"How do you prove the system is working correctly in production?" This is the final question that separates implementers from owners.

SLO / SLA / SLI — Exact Definitions

SLI — What you measure

Service Level Indicator. A specific metric: redirect success rate (%), p99 redirect latency (ms), URL creation success rate (%). Must be objective and measurable.

SLO — What you promise yourself

Service Level Objective. Internal target derived from SLIs: redirect success rate ≥ 99.9%, p99 redirect latency ≤ 50ms. No contractual obligation. Used to guide engineering decisions.

SLA — What you promise customers

Service Level Agreement. Contractual promise: usually SLO minus a buffer (99.5% availability). Breaching SLA triggers compensation (credits, refunds). Never set SLA = SLO — you'll be paying credits constantly.

Error Budget

99.9% SLO = 0.1% allowed errors. Monthly: 0.1% × 30 × 24 × 60 = 43.2 minutes of allowed downtime. Error budget burn rate: if burning at 10× normal rate → page oncall before budget runs out. This is Google SRE's core concept.

Synthetic Monitoring — Proving Correctness

// Synthetic test — runs every 60 seconds from each region
void syntheticTest() {
    String unique = "https://test-target.com/" + UUID.randomUUID();

    // Step 1: Create short URL
    Response create = POST("/api/v1/urls", {"long_url": unique});
    assert(create.status == 201);
    String shortCode = create.body.short_code;

    // Step 2: Resolve short URL (no redirect-follow)
    Response redirect = GET("/" + shortCode, followRedirects=false);
    assert(redirect.status == 302);
    assert(redirect.header("Location").equals(unique));

    // Step 3: Assert total latency
    assert(totalMs < 200);

    // Step 4: Record metrics
    metrics.record("synthetic.latency", totalMs);
    metrics.record("synthetic.success", 1);
}

// This catches:
// - Cache misconfiguration (wrong URL returned)
// - Replication lag breaking read-your-own-writes
// - SSL certificate expiry
// - DNS misconfiguration
// - Database inconsistency (short_code exists but wrong long_url)

Predictive Monitoring — The Staff Signal

// REACTIVE (Junior): Alert when pool = 0 → already broken
if (poolRemaining == 0) alert("Pool empty!"); // too late

// PREDICTIVE (Staff): Alert when pool WILL hit 0 in N minutes
double rate = metrics.rate("pool.consumed", 5, MINUTES); // per second
long   remaining = getPoolRemaining();
double timeToEmpty = remaining / rate; // seconds

if (timeToEmpty < 600)  alert(P2, "Pool empties in 10min");
if (timeToEmpty < 120)  alert(P0, "Pool empties in 2min");
if (timeToEmpty < 30)   circuitBreaker.trip(); // auto-failover

// PromQL equivalent:
// url_pool_remaining / rate(url_pool_consumed_total[5m]) < 600
// → alert: "Pool depletes in less than 10 minutes"

Chaos Engineering

Chaos Experiments to Run

Kill one Cassandra node

Expected: reads continue (RF=3, ONE). Load redistributes. No user impact. Verify: error rate unchanged, p99 within 10% of baseline.

Kill Redis in one region

Expected: L1 Caffeine absorbs 80% of reads. Remaining fall to Cassandra. p99 increases from 10ms to 50ms. No errors. Verify: circuit breaker disables L2 path.

Kill pool refill job

Expected: predictive alert fires at 10min. Circuit breaker trips at 2min. Snowflake fallback activates. Writes continue at slightly higher latency. Verify: no write failures.

Network partition — US-East isolated

Expected: GeoDNS removes US-East in 30-90s. EU becomes write primary via etcd. All traffic served. Verify: RTO ≤ 90s, data loss = 0 (Kafka replay covers gap).

Cross Questions — Interview Traps

Every question an interviewer has ever asked about URL shortener design, with the exact answer that signals senior/staff level thinking.

Category 1 — "Why not just use X?"

Q"Why not just use one database for everything?"

Read and write access patterns are fundamentally different. Writes: by user_id (list my URLs), complex filters, ACID transactions, UNIQUE constraints. Reads: by short_code only, ultra-low latency, eventual consistency OK. Optimising one schema for both means compromising both. CQRS lets each store be perfect for its use case. Additionally, read traffic is 100× write traffic — we need to scale them independently. A single DB would require enormous resources to serve 11,500 r/s while also handling writes and complex queries.

Q"Why Cassandra and not just use Redis for reads?"

Redis is memory-only — 9TB in Redis would require enormous, expensive memory clusters. Cassandra stores on SSD with memory for hot data, giving disk-backed persistence at a fraction of the cost. Redis IS used — as L2 cache in front of Cassandra. The combination is optimal: Redis for the hottest data (sub-millisecond, ~18% of requests), Cassandra for everything else (single-digit milliseconds, ~0.5% of requests). You cannot replace a database with a cache — you always need the backing store.

Q"Why Kafka and not write directly to ClickHouse for analytics?"

Three reasons. Decoupling: if ClickHouse is slow or down, it blocks URL creation. With Kafka, analytics failure never touches the write path. Fan-out: multiple consumers (analytics, Cassandra updater, ML pipeline) independently read the same events without the producer knowing about them. Adding a new consumer requires no producer changes. Replay: Kafka retains events for disaster recovery. Direct ClickHouse writes give you none of these. The analytics data is a bonus — the real reason is Kafka's role as a durable event log for DR.

Q"Why not use a global CDN for everything and skip internal caches?"

CDN has two fundamental limitations. Invalidation speed: purging a URL from all global edge nodes takes 1-5 minutes. If a URL is deleted or updated, users get wrong redirects globally for minutes. L1/L2 invalidation is near-instant. Write path: CDN only helps with reads. L1/L2 caches also support the read path within the datacenter, between the CDN and the database. CDN is L3 — it serves requests that never reach your infrastructure. L1 and L2 serve requests that do reach your infrastructure. They complement each other; CDN cannot replace L1/L2.

Category 2 — "What happens when X fails?"

Q"What if your pool refill job crashes silently and the pool hits zero?"

Three-layer defence: Predictive alert: time-to-empty metric alerts oncall at 10 minutes remaining — pool never hits zero if someone responds. Circuit breaker: at 2 minutes remaining, automatically trips and switches the entire write path to Snowflake ID generation. The service continues at slightly higher latency but zero failures. Kubernetes liveness probe: the refill job has a health endpoint. Kubernetes restarts it within 30s of failure. The pool is an optimisation — it is never the only option. Snowflake ID is always the fallback. This is the key insight: the pre-generated pool is a performance optimisation, not a hard dependency.

Q"What if two users try to create the same custom alias at the same time?"

This is a concurrent write problem. The UNIQUE constraint on short_code in PostgreSQL is a database-level serialisation point — only one transaction can succeed. The first INSERT commits. The second gets a unique constraint violation (error code 23505 in PostgreSQL). We catch this at the application layer and return HTTP 409 Conflict with a message: "This alias is already taken. Please choose a different one." No retry logic needed — we just inform the user. This is why UNIQUE constraints exist in relational databases: they enforce invariants atomically without application-level locking.

Q"What if Kafka goes down during a write?"

Design decision: is Kafka in the synchronous write path? For analytics: no — URL creation succeeds, the click event is queued locally and retried when Kafka recovers. For Cassandra updates: if we need Cassandra to be updated synchronously, we need to decide our failure mode. Option A: fail the write (user retries, no data inconsistency, but write fails). Option B: succeed the write, accept that Cassandra won't be updated until Kafka recovers (read-your-own-writes fails temporarily). For URL shortener, Option B is acceptable: URL exists in PostgreSQL (source of truth), Cassandra catches up when Kafka recovers. Users see 404 on their own URL for a short window — minor UX issue. We choose availability over consistency here, consistent with our AP stance.

Category 3 — "How would you change the design if...?"

Q"What if URLs need to expire after 24 hours?"

Multiple layers: Cassandra: native TTL — INSERT INTO url_mappings (...) USING TTL 86400. Cassandra automatically deletes the row after 24h. PostgreSQL: background job — UPDATE url_mappings SET is_active=false WHERE expires_at < NOW(), runs every 5 minutes. Publish url.expired events to Kafka. Cache: set cache TTL = min(24h, time remaining until expiry). Calculate at cache-write time. CDN: CDN invalidation API call when URL expires. Set Cache-Control max-age to remaining TTL. On redirect: check is_active and expires_at before returning 302. Return 410 Gone (not 404) — 410 means "permanently removed" which tells search engines and browsers to remove this from their indexes.

Q"What if we need to support 10× current scale — 1 billion reads/sec?"

The architecture doesn't fundamentally change — only the size of each tier. At 1B r/s: cache hit rate must be 99.99% — only 100K r/s to Cassandra. CDN: ensure 500+ PoPs globally (Cloudflare has 300+), route to nearest. More edge nodes. L1: increase Caffeine maximumSize to 500K entries per JVM. Add more JVM instances behind the load balancer. L2: Redis cluster with more shards. Move to Redis Enterprise for better cluster management. Cassandra: add more nodes — Cassandra scales linearly. More partitions, more VNodes. Write path: 115 writes/s × 10 = 1150 writes/s — still manageable on PostgreSQL primary. The beauty of this architecture is that each layer scales independently. This is why we separated concerns from the start.

Q"What if you need to support QR codes for every short URL?"

QR code generation is CPU-intensive but deterministic — same short URL always produces the same QR code. Options: On-demand generation: generate QR code when requested, cache aggressively. QR code = PNG/SVG of ~1-5KB. Cache key: qr:{shortCode}. L2 Redis, CDN, or S3. Pre-generation: generate QR at URL creation time, store in S3. API returns S3 URL. Simple but requires S3 storage (~5KB × 18B URLs = ~90TB). My choice: on-demand generation with CDN caching. QR code is computed once per short_code, cached at CDN forever (QR code never changes). S3 as fallback storage for rarely accessed codes. This adds a QR code generation service — stateless, horizontally scalable, sits behind the CDN.

Full Glossary — 80+ Terms

Every term you need to know, with a one-liner definition and when to use it in an interview. Sorted by category.

Concurrency & Locking

Term	One-Line Definition	Interview context
`SKIP LOCKED`	SQL hint: skip rows already locked by other transactions instead of waiting. Enables concurrent pool pops without deadlock.	URL pool pop mechanism
`FOR UPDATE`	Lock selected rows within transaction to prevent concurrent modification.	Pool pop + SKIP LOCKED
Optimistic locking	Assume no conflict; detect at commit using version column. Fast reads, retries on conflict.	High-read, low-write contention scenarios
Pessimistic locking	Lock resource immediately on access. Guaranteed no conflict but blocks others.	High-write contention scenarios
Deadlock	Two transactions each wait for a lock the other holds. Neither can proceed. DB detects and kills one.	Why SKIP LOCKED is better than FOR UPDATE alone
MVCC	Multi-Version Concurrency Control. Multiple versions of data exist simultaneously. Readers never block writers.	How PostgreSQL achieves high concurrency
CAS (Compare-and-Swap)	Atomic: "set value only if current value = expected." Foundation of lock-free data structures and etcd leader election.	Leader election mechanism
Idempotency	Operation can be applied multiple times with same result. Essential for retry logic.	Kafka producer, API design
Mutex	Mutual exclusion lock. Only one thread can hold it. Redis SETNX implements distributed mutex.	Thundering herd protection
Semaphore	Like a mutex but allows N threads simultaneously. Redis can implement with INCR/DECR.	Concurrency control, rate limiting

Caching

Term	One-Line Definition	Interview context
`SETNX`	Redis SET if Not eXists. Returns 1 if set, 0 if key already existed. Used for distributed mutex.	Thundering herd prevention
LRU	Least Recently Used. Evicts item not accessed for longest time. Recency-based.	Compare to LFU; Guava uses this
LFU	Least Frequently Used. Evicts item accessed fewest times. Frequency-based. Better for Zipf distributions.	Redis allkeys-lfu, Caffeine uses W-TinyLFU variant
ARC	Adaptive Replacement Cache. Balances LRU and LFU dynamically. Used in ZFS, some SSD controllers.	Mention as alternative to LRU/LFU
W-TinyLFU	Window Tiny LFU. Caffeine's algorithm. Count-Min Sketch estimates frequency. Near-optimal hit rate.	Why Caffeine beats Guava
Count-Min Sketch	Probabilistic frequency counter using multiple hash functions. O(1) space, approximate counts.	How W-TinyLFU estimates access frequency
Bloom filter	Probabilistic membership test. Zero false negatives, small false positive rate. O(1) space.	Cache penetration prevention
Thundering herd	Many concurrent cache misses on same key → mass DB requests → DB overwhelmed.	Problem; SETNX is the fix
Cache stampede	Same as thundering herd. Also called dogpile effect.	Know all three names
Cache penetration	Requests for nonexistent keys bypass cache every time. Fix: negative caching or Bloom filter.	Security + performance concern
Cache avalanche	Mass simultaneous TTL expiry → mass DB requests. Fix: TTL jitter.	Cold start scenario
Cache pollution	Low-frequency items evict high-frequency items. Fix: LFU eviction.	Scraper/bot traffic scenario
Write-through	Write to cache + DB synchronously. Strong consistency. Write latency penalty.	Compare cache invalidation strategies
Write-back	Write to cache only, async flush to DB. Fast writes, risk of data loss on cache failure.	High-write scenarios
Cache-aside	App manages cache: read cache → miss → DB → populate cache. Most common pattern.	Our URL shortener read pattern
TTL jitter	Random offset added to TTL to prevent simultaneous expiry: TTL = base + random(0, 20%).	Cache avalanche prevention

Distributed Systems

Term	One-Line Definition	Interview context
CAP theorem	During network Partition: choose Consistency or Availability. Cannot have both.	Justify AP choice for URL shortener
PACELC	Extends CAP: if Partition→A vs C; Else→Latency vs Consistency. More practical than CAP alone.	Staff-level consistency discussion
ACID	Atomicity, Consistency, Isolation, Durability. PostgreSQL guarantees. Strong but slow cross-region.	Why PostgreSQL for writes
BASE	Basically Available, Soft state, Eventually consistent. Cassandra's philosophy.	Why Cassandra for reads
Eventual consistency	All replicas converge to same value given time and no new updates.	URL redirect reads (ONE consistency)
Strong consistency	Every read returns the most recent write. Requires coordination = higher latency.	Write path requirement
Linearizability	Strictest consistency: operations appear instantaneous and sequential. Spanner provides this.	Alternative to eventual consistency (costly)
Read-your-own-writes	After writing, you always see your own write on subsequent reads. Violated by async replication.	Cross-region replication problem
Monotonic reads	Once you've seen data version N, you never see an older version N-1. Time travel prevention.	Consistency guarantee weaker than strong
Consistent hashing	Maps keys to nodes on a ring. Adding/removing a node moves minimal keys. Used by Cassandra, Redis Cluster.	How Cassandra distributes data
Virtual nodes (vnodes)	Each physical node owns multiple virtual positions on ring. Better load distribution.	Cassandra internals
Quorum	Majority (N/2 + 1) must agree. R + W > N = strong consistency. Key formula.	Cassandra consistency levels
Raft	Consensus algorithm for leader election and log replication. Used by etcd, CockroachDB. Simpler than Paxos.	etcd leader election
Paxos	Original distributed consensus algorithm. Basis for Raft and ZAB. Proven correct but complex.	Historical context for Raft/ZAB
ZAB	ZooKeeper Atomic Broadcast. ZooKeeper's consensus protocol. Similar to Raft.	ZooKeeper internals
Split brain	Two nodes both believe they are primary. Causes unrecoverable data divergence.	Why fencing tokens are necessary
Fencing token	Monotonically increasing token from lock service. Storage rejects writes with old tokens.	Split brain prevention mechanism
Two-phase commit (2PC)	Distributed transaction: prepare phase + commit phase. Blocking if coordinator fails. Avoid at scale.	Why we don't use it (ZooKeeper range approach)
Saga pattern	Distributed transaction via sequence of local transactions with compensating rollbacks.	Alternative to 2PC for microservices
WAL	Write-Ahead Log. All changes logged before applying. Enables replication, recovery, point-in-time restore.	PostgreSQL replication mechanism

Networking & Infrastructure

Term	One-Line Definition	Interview context
Anycast	Same IP announced from multiple locations. BGP routes to nearest. Used by CDNs.	Faster failover than GeoDNS (no TTL wait)
GeoDNS	DNS returns different IPs based on requester's geographic location.	Region routing for URL shortener
GSLB	Global Server Load Balancer. Routes globally based on health, latency, geography.	Enterprise alternative to Route53
BGP	Border Gateway Protocol. Internet routing protocol. BGP withdrawal = region removed from routing.	Anycast failover mechanism
PoP	Point of Presence. CDN edge node in a city. Cloudflare: 300+ PoPs globally.	CDN geography discussion
Circuit breaker	Stops calling failing service. CLOSED→OPEN→HALF-OPEN states. Prevents cascade failure.	Pool empty, Redis down, service failure
Bulkhead pattern	Isolate failures via separate thread pools per downstream service. One slow service doesn't starve others.	Microservice resilience
Sidecar proxy	Service mesh component (Envoy). Handles retries, circuit breaking, mTLS without app code changes.	Istio/Linkerd architecture
mTLS	Mutual TLS. Both client and server authenticate with certificates. Service-to-service security.	Internal service security
Backpressure	Slow consumer signals producer to slow down. Prevents memory overflow and cascade failure.	Kafka consumer lag management

Reliability & Performance

Term	One-Line Definition	Interview context
p99 latency	99th percentile: 99% of requests complete faster than this value. More meaningful than average.	SLO definition
p99.9 latency	99.9th percentile: the tail. Often 10-100× worse than p99. Where real user pain lives.	Staff-level latency discussion
Zipf distribution	Power-law: small number of items get vast majority of traffic. Top 20% URLs = 80% traffic.	Justifies LFU caching over LRU
SLI	Service Level Indicator. What you measure: success rate, p99 latency.	Foundation of SLO
SLO	Service Level Objective. Internal target: p99 ≤ 50ms, availability ≥ 99.9%.	Engineering goal, not contractual
SLA	Service Level Agreement. Contractual promise. Usually SLO minus buffer. Breach = credits.	Customer-facing guarantee
Error budget	Allowed failure quota from SLO: 99.9% = 43.2 min/month downtime allowed.	SRE decision framework
RPO	Recovery Point Objective. Max acceptable data loss. Our target: < 1s (Kafka bounded).	DR planning
RTO	Recovery Time Objective. Max acceptable downtime. Our target: < 60s (GeoDNS failover).	DR planning
MTTR	Mean Time To Recovery. Average time to restore service after failure.	DR metrics
Canary deployment	Route small % of traffic to new version. 1%→5%→25%→100%. Automated SLO-gated promotion.	Safe recovery and deployment
Blue-green deployment	Two identical environments. Instant traffic switch. Instant rollback.	Zero-downtime deployment
Feature flag	Toggle functionality without deployment. Enables gradual rollout, A/B testing, kill switches.	Progressive feature rollout
Chaos engineering	Intentionally inject failures in production to find weaknesses before real incidents do.	How to prove the system actually works
Birthday paradox	In a random sample of N items from M combinations, first collision expected at √M picks.	Why MD5 truncation fails for ID generation

Kafka Specific

Term	One-Line Definition	Interview context
acks=all	Producer waits for all in-sync replicas to acknowledge. Max durability.	DR — why data survives leader failure
min.insync.replicas	Minimum ISR count for produce to succeed. Set to 2 with RF=3 for safety floor.	Pair with acks=all
enable.idempotence	Producer assigns sequence numbers. Broker deduplicates retries. Exactly-once at producer level.	At-least-once vs exactly-once
Consumer group	Multiple consumers sharing partitions. Each partition consumed by exactly one member.	Parallel consumption, independent read progress
Consumer offset	Position of last read message. Committed to __consumer_offsets topic. Enables crash recovery.	Kafka replay for DR
Log compaction	Kafka retains only latest value per key. Enables event sourcing and efficient state rebuild.	URL state as a compacted log
DLQ	Dead Letter Queue. Failed messages after max retries sent here for inspection. Never block main consumer.	Consumer error handling
MirrorMaker 2	Kafka's cross-cluster replication. Mirrors topics between DCs. Enables DR and analytics sync.	Cross-region event replication
At-least-once delivery	Message delivered one or more times. Consumer must be idempotent.	Default Kafka guarantee
Exactly-once semantics	Message delivered exactly once. Requires idempotent producer + transactional consumer.	Critical for financial, dangerous for performance

Security

Term	One-Line Definition	Interview context
JWT	JSON Web Token. Stateless bearer token with signed claims. Verified by signature, no DB lookup.	Stateless authentication at scale
RS256	RSA signature with SHA-256. Asymmetric — only IdP can sign, anyone can verify with public key.	Multi-service JWT verification
HS256	HMAC signature with SHA-256. Symmetric shared secret — any holder can forge tokens. Avoid for public APIs.	Why RS256 is preferred
OAuth2	Authorization framework. Delegates access. Flows: Auth Code (web), Client Credentials (service-to-service).	API authentication for URL shortener
SSRF	Server-Side Request Forgery. Attacker tricks server into making requests to internal services.	URL validation requirement
Token bucket	Rate limiting: bucket fills at constant rate. Request consumes a token. Allows bursts.	API rate limiting implementation
Sliding window log	Exact rate limiting: store request timestamps, count within window. Memory-intensive but precise.	Compare rate limit algorithms
CORS	Cross-Origin Resource Sharing. Browser policy for cross-domain requests. Controlled via headers.	Web API security
301 vs 302	301=permanent (browser caches forever). 302=temporary (browser always calls server). Use 302 for analytics.	Redirect type choice and reasoning
410 Gone	HTTP status for permanently removed resource. Browser and search engines remove from index. Use for expired URLs.	URL expiry handling, more informative than 404

1-Page Cheat Sheet

Cover this the night before. If you can answer every item below without looking, you are ready.

📋 Interview opening script: "Before I design anything, let me establish scale. 100M DAU, 10M writes/day = 115 writes/sec, 100:1 read ratio = 11,500 reads/sec, 5-year storage ~20TB. The system is heavily read-biased and intercontinental, and I'll choose availability over consistency because a stale redirect is acceptable but a 503 is not."

115 w/sWrites/sec10M÷86,400

11,500 r/sReads/sec115×100

100:1Read:Writeclicks>>creates

~20 TBTotal storageRF×raw+index

<70%Pool load factorrefill at 20%

99.5%Cache hit rateL1+L2+L3

57 r/sDB reads11,500×0.005

~90sFailover time30s+60s TTL

Components — One-Line Each

Component	Technology	Why this choice
ID Generation	Pool (SKIP LOCKED) + Snowflake fallback	O(1) write path, no collisions, circuit breaker protected
L1 Cache	Caffeine (W-TinyLFU)	In-process, zero network, best hit rate for Zipf distribution
L2 Cache	Redis Cluster (allkeys-lfu)	Shared across instances, LFU for viral URLs, SETNX mutex
L3 Cache	CDN (CloudFront/Fastly)	Eliminates intercontinental latency for hot URLs
Write DB	PostgreSQL (LOCAL_QUORUM)	ACID, UNIQUE constraint, WAL for replication
Read DB	Cassandra (ONE)	Linear scale, multi-region active-active, key-value optimised
Analytics	ClickHouse / BigQuery	Columnar, append-only, async from Kafka
Event log	Kafka (acks=all, RF=3)	DR replay + analytics fan-out + decoupling
Cross-region Kafka	MirrorMaker 2	DR: replay events after datacenter recovery
Leader election	etcd (Raft)	Split brain prevention, fencing token support
Traffic routing	GeoDNS (Route53) / Anycast	Route to nearest healthy region, 30-90s failover
Auth	JWT (RS256) + OAuth2	Stateless validation, no per-request Auth service call
Rate limiting	Token bucket (Redis)	Allows bursts, per-user, tier-aware
Redirect type	302 Found	Enables analytics, allows URL updates/expiry

The 8 Terms That Impress Interviewers Most

Say these naturally — don't force them in

W-TinyLFU — "Caffeine uses W-TinyLFU which maintains near-optimal hit rate via a Count-Min Sketch frequency estimator — far better than Guava's LRU for skewed access patterns."
SKIP LOCKED — "The pool pop uses FOR UPDATE SKIP LOCKED — multiple workers can pop concurrently without blocking each other, which is impossible with plain FOR UPDATE."
PACELC — "CAP only covers partition scenarios. PACELC is more useful: during partition we choose Availability; in normal operation we choose Latency over Consistency — ONE reads in Cassandra."
Fencing token — "To prevent split brain, every write includes the etcd lease version. Storage rejects writes with a lower version, killing zombie primaries."
LeveledCompactionStrategy — "For Cassandra reads I'd use LCS over the default STCS — it maintains sorted SSTables in levels, reducing read amplification for our 100:1 read-heavy workload."
Predictive depletion monitoring — "Rather than alerting when the pool hits zero — which is already broken — I'd alert when time-to-empty drops below 10 minutes based on the current consumption rate."
acks=all + min.insync.replicas — "Kafka producers use acks=all with min.insync.replicas=2. This bounds our RPO to the Kafka replication lag — typically under 1 second — enabling deterministic disaster recovery via replay."
410 Gone vs 404 — "Expired URLs return 410 Gone, not 404. 410 is semantically permanent — it tells browsers and search engines to remove the URL from their index. 404 implies the resource might come back."

✅ You are ready when: You can narrate the entire system design — from requirements to components to failure modes to observability — in 45 minutes without notes, and correctly answer any follow-up on any component without hesitation. Use this cookbook to practice out loud, not just to read.

URL Shortenerfrom Zero to Staff Engineer

How to Use This Cookbook

Complete Mind Map

The 5 Questions to Answer in Every Interview

Scale & Numbers

Traffic Estimation

Storage Estimation

URL Length — Why 6 Characters?

Bandwidth & Latency Targets

ID Generation & Hashing

Approach 1 — MD5 Truncation (Naive, Wrong)

Approach 2 — Pre-Generated Pool (Recommended)

Approach 3 — Snowflake ID (Best Fallback)

Circuit Breaker Pattern — Pool Empty Scenario

Caching — All Layers

L1 — Caffeine (In-Process Cache)

L2 — Redis Cluster (Regional Cache)

The Thundering Herd Problem — Full Deep Dive

L3 — CDN Edge Cache

The Three Cache Failure Modes

Database Design

PostgreSQL — Write Database

Cassandra — Read Database

Hot Partition Problem — Deep Dive

ClickHouse — Analytics Database

Consistency Models

CAP Theorem — What It Actually Means

PACELC — The Real Model

Cassandra Consistency Levels — Complete Reference

Read-Your-Own-Writes — The Hardest Problem

Replication Modes — Sync vs Async

Kafka & Event Streaming

Kafka vs Traditional Message Queue

Key Kafka Configuration — Why Each Setting Matters

MirrorMaker 2 — Cross-Region Replication

Full Architecture

Component Overview

API Design

Write Path Deep Dive

Read Path Deep Dive

Failover & Disaster Recovery

RTO & RPO — The Two DR Metrics

GeoDNS Failover — Exact Mechanism

Leader Election — Preventing Split Brain

US-East Dies — Full Runbook

Security

Authentication — JWT & OAuth2

Rate Limiting — Token Bucket Deep Dive

301 vs 302 Redirect — Full Analysis

SSRF Prevention

Observability & SLOs

SLO / SLA / SLI — Exact Definitions

Synthetic Monitoring — Proving Correctness

Predictive Monitoring — The Staff Signal

Chaos Engineering

Cross Questions — Interview Traps

Category 1 — "Why not just use X?"

Category 2 — "What happens when X fails?"

Category 3 — "How would you change the design if...?"

Full Glossary — 80+ Terms

Concurrency & Locking

Caching

Distributed Systems

Networking & Infrastructure

Reliability & Performance

Kafka Specific

Security

1-Page Cheat Sheet

Components — One-Line Each

The 8 Terms That Impress Interviewers Most

URL Shortener
from Zero to Staff Engineer