Rate Limiter System Design - Interactive Summary

System Requirements & Estimates

100M

Daily Active Users

1.15M

Requests Per Second

~10GB

Memory for Cache

429

Rate Limit Status Code

Functional Requirements

Reject requests exceeding rate limits with HTTP 429 status
Support configurable rate limits per client/API
Smooth traffic spikes to protect backend services
Provide rate limit headers in responses

Non-Functional Requirements

High availability (99.99% uptime)
Low latency (<10ms for rate limit checks)
Horizontal scalability
Fault tolerance with graceful degradation

Algorithm Comparison Matrix

Algorithm	Burst Handling	Storage Usage	Complexity	Latency	Best Use Case
Token Bucket	✅ Allows up to bucket size	Low (~4-5GB)	Medium	O(1)	API rate limiting with burst allowance
Leaky Bucket	❌ No bursts allowed	Low (~4-5GB)	Low	O(1)	Steady flow rate enforcement
Fixed Window	⚠️ Edge case issues	Very Low (~2GB)	Very Low	O(1)	Simple rate limiting needs
Sliding Window Counter	✅ Smooth distribution	Medium (~6GB)	Medium	O(1)	Balanced accuracy and efficiency
Sliding Window Log	✅ Most accurate	High (~800GB)	High	O(log N)	High-precision rate limiting

Interactive Algorithm Visualizations

Token Bucket Algorithm

Tokens: 10/10

Capacity: 10 tokens | Refill Rate: 3 tokens per refill
Each request consumes 1 token. Requests are rejected when bucket is empty.

Leaky Bucket Algorithm

Requests leak at constant rate. Overflow results in 429 rejection.

Sliding Window Visualization

Window slides continuously, counting requests within the time frame.

Data Store Architecture - Redis

Why Redis?

In-memory performance with sub-millisecond latency
Atomic operations for concurrent access
Built-in TTL for automatic cleanup
Master-slave replication with Redis Sentinel
Lua scripting for complex atomic operations

Sample Implementation - Token Bucket

import redis
import time
from typing import Tuple

class TokenBucketRateLimiter:
    def __init__(self, redis_client: redis.Redis):
        self.redis = redis_client
        self.bucket_capacity = 100  # Maximum tokens
        self.refill_rate = 10      # Tokens per second
        
    def allow_request(self, client_id: str) -> Tuple[bool, int]:
        """Check if request is allowed using token bucket algorithm"""
        key = f"rate_limit:{client_id}"
        
        # Lua script for atomic token bucket operation
        lua_script = """
        local key = KEYS[1]
        local capacity = tonumber(ARGV[1])
        local refill_rate = tonumber(ARGV[2])
        local now = tonumber(ARGV[3])
        
        local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
        local tokens = tonumber(bucket[1]) or capacity
        local last_refill = tonumber(bucket[2]) or now
        
        -- Calculate tokens to add based on time passed
        local elapsed = now - last_refill
        local new_tokens = math.min(capacity, tokens + elapsed * refill_rate)
        
        if new_tokens >= 1 then
            -- Consume a token
            redis.call('HMSET', key, 'tokens', new_tokens - 1, 'last_refill', now)
            redis.call('EXPIRE', key, 3600)
            return {1, new_tokens - 1}
        else
            return {0, new_tokens}
        end
        """
        
        result = self.redis.eval(
            lua_script, 
            1, 
            key, 
            self.bucket_capacity, 
            self.refill_rate, 
            time.time()
        )
        
        return bool(result[0]), int(result[1])

Partitioning Strategy

Hash-based sharding: Consistent hashing for even distribution
Key pattern: rate_limit:{client_id}:{timestamp_bucket}
Replication factor: 3 replicas for high availability
Failover: Redis Sentinel for automatic master election

Key Takeaways & Recommendations

💪 Strengths Demonstrated

▼

Deep understanding of algorithm trade-offs
Comprehensive comparison across multiple dimensions
Strong analytical skills in evaluating storage and latency
Practical implementation considerations

📈 Areas for Improvement

▼

Consider distributed clock skew handling
Explore cost trade-offs for different storage solutions
Address edge cases in distributed systems
Discuss graceful degradation strategies

🎯 Final Recommendation

▼

Token Bucket emerges as the optimal choice for most production scenarios, offering:

Best balance between burst handling and rate enforcement
Reasonable storage requirements (~5GB for 100M users)
O(1) latency with atomic Redis operations
Production-proven at scale (used by AWS, Google Cloud)

For specific use cases requiring strict flow control, consider Leaky Bucket. For simpler needs with lower accuracy requirements, Fixed Window suffices.