Rate Limiter System Design
Comprehensive analysis of rate limiting algorithms for 100M DAU at 1.15M RPS
System Requirements & Estimates
Functional Requirements
- Reject requests exceeding rate limits with HTTP 429 status
- Support configurable rate limits per client/API
- Smooth traffic spikes to protect backend services
- Provide rate limit headers in responses
Non-Functional Requirements
- High availability (99.99% uptime)
- Low latency (<10ms for rate limit checks)
- Horizontal scalability
- Fault tolerance with graceful degradation
Algorithm Comparison Matrix
| Algorithm | Burst Handling | Storage Usage | Complexity | Latency | Best Use Case |
|---|---|---|---|---|---|
| Token Bucket | ✅ Allows up to bucket size | Low (~4-5GB) | Medium | O(1) | API rate limiting with burst allowance |
| Leaky Bucket | ❌ No bursts allowed | Low (~4-5GB) | Low | O(1) | Steady flow rate enforcement |
| Fixed Window | ⚠️ Edge case issues | Very Low (~2GB) | Very Low | O(1) | Simple rate limiting needs |
| Sliding Window Counter | ✅ Smooth distribution | Medium (~6GB) | Medium | O(1) | Balanced accuracy and efficiency |
| Sliding Window Log | ✅ Most accurate | High (~800GB) | High | O(log N) | High-precision rate limiting |
Interactive Algorithm Visualizations
Token Bucket Algorithm
Capacity: 10 tokens | Refill Rate: 3 tokens per refill
Each request consumes 1 token. Requests are rejected when bucket is empty.
Leaky Bucket Algorithm
Requests leak at constant rate. Overflow results in 429 rejection.
Sliding Window Visualization
Window slides continuously, counting requests within the time frame.
Data Store Architecture - Redis
Why Redis?
- In-memory performance with sub-millisecond latency
- Atomic operations for concurrent access
- Built-in TTL for automatic cleanup
- Master-slave replication with Redis Sentinel
- Lua scripting for complex atomic operations
Sample Implementation - Token Bucket
import redis
import time
from typing import Tuple
class TokenBucketRateLimiter:
def __init__(self, redis_client: redis.Redis):
self.redis = redis_client
self.bucket_capacity = 100 # Maximum tokens
self.refill_rate = 10 # Tokens per second
def allow_request(self, client_id: str) -> Tuple[bool, int]:
"""Check if request is allowed using token bucket algorithm"""
key = f"rate_limit:{client_id}"
# Lua script for atomic token bucket operation
lua_script = """
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now
-- Calculate tokens to add based on time passed
local elapsed = now - last_refill
local new_tokens = math.min(capacity, tokens + elapsed * refill_rate)
if new_tokens >= 1 then
-- Consume a token
redis.call('HMSET', key, 'tokens', new_tokens - 1, 'last_refill', now)
redis.call('EXPIRE', key, 3600)
return {1, new_tokens - 1}
else
return {0, new_tokens}
end
"""
result = self.redis.eval(
lua_script,
1,
key,
self.bucket_capacity,
self.refill_rate,
time.time()
)
return bool(result[0]), int(result[1])
Partitioning Strategy
- Hash-based sharding: Consistent hashing for even distribution
- Key pattern: rate_limit:{client_id}:{timestamp_bucket}
- Replication factor: 3 replicas for high availability
- Failover: Redis Sentinel for automatic master election
Key Takeaways & Recommendations
💪 Strengths Demonstrated
- Deep understanding of algorithm trade-offs
- Comprehensive comparison across multiple dimensions
- Strong analytical skills in evaluating storage and latency
- Practical implementation considerations
📈 Areas for Improvement
- Consider distributed clock skew handling
- Explore cost trade-offs for different storage solutions
- Address edge cases in distributed systems
- Discuss graceful degradation strategies
🎯 Final Recommendation
Token Bucket emerges as the optimal choice for most production scenarios, offering:
- Best balance between burst handling and rate enforcement
- Reasonable storage requirements (~5GB for 100M users)
- O(1) latency with atomic Redis operations
- Production-proven at scale (used by AWS, Google Cloud)
For specific use cases requiring strict flow control, consider Leaky Bucket. For simpler needs with lower accuracy requirements, Fixed Window suffices.