APIs API · 07 · 09

Rate limiting: code and snippet reading

Read real limiter snippets — a token-bucket refill, a Redis INCR race, a sliding-window counter, a sorted-set log — and pick the behaviour or the highest-leverage fix.

API Senior ◷ 14 min

Level

FoundationsJuniorMiddleSenior

A limiter is correct or it is theatre, and the difference lives in a few lines of refill math and Redis commands. Read each snippet, predict its behaviour under concurrency, and pick the fix a senior would make first.

Goal

Practise the loop you run when reviewing a limiter: trace the refill, spot the race, check the boundary math, and reach for the atomic, shared-counter fix.

Snippet 1 — the token-bucket refill

# Per-request, in the app process. tokens/ts are read from Redis with GET, written back with SET.
def allow(key, rate, capacity, cost=1):
    tokens, ts = read(key)              # last stored tokens + timestamp
    now = time.time()
    tokens = min(capacity, tokens + (now - ts) * rate)   # refill
    if tokens >= cost:
        tokens -= cost
        write(key, tokens, now)         # GET ... then SET (two round trips)
        return True
    write(key, tokens, now)
    return False

Quiz

The refill math is correct in isolation. What breaks when two requests for the same key arrive concurrently across two app nodes?

Snippet 2 — the Redis fixed-window counter

def allow(key, limit, window_s):
    n = redis.incr(key)        # atomic increment
    if n == 1:
        redis.expire(key, window_s)   # set TTL only on first hit
    return n <= limit

Quiz

INCR is atomic, so what is the latent failure in this fixed-window limiter?

Snippet 3 — the sliding-window counter

# limit = 100 / 60s. Two fixed-window counters: this minute and the previous one.
def estimate(prev_count, curr_count, elapsed_in_window_s, window_s=60):
    overlap = (window_s - elapsed_in_window_s) / window_s   # fraction of prev window still in view
    return curr_count + prev_count * overlap
# Example: 18s into the current minute, prev=80, curr=12

Quiz

For the example values (18s in, prev=80, curr=12, limit=100), what does estimate return and is the request admitted?

Snippet 4 — the sliding-window log in Redis

def allow(key, limit, window_s):
    now = time.time()
    pipe = redis.pipeline()
    pipe.zremrangebyscore(key, 0, now - window_s)   # drop entries older than the window
    pipe.zadd(key, {str(uuid4()): now})             # record this request
    pipe.zcard(key)                                  # count what remains
    pipe.expire(key, window_s)
    _, _, count, _ = pipe.execute()
    return count <= limit

Quiz

This log-based limiter is exact and atomic (MULTI/EXEC pipeline). What is the real cost you sign up for, and the subtle behaviour of this ordering?

Recap

Limiter bugs live in a few lines: app-side read-modify-write double-spends across nodes, so the check must run atomically inside Redis; INCR-then-EXPIRE can strand a TTL-less key forever; the sliding-window counter weights the previous window by its remaining overlap, not in full; and the sliding-window log is exact and atomic but pays one sorted-set entry per request. Trace the concurrency, check the boundary math, and push the whole decision into one atomic, shared step.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.