Overview#
- An
HTTP
rate rate limiter
can:- block requests that exceed defined thresholds
- prevent
resource starvation
caused by Denial of Service DoS
attacks reduce costs
by limiting excess requests. (Critically important for services that use paid third party services)prevent server overload
from bots or malicious users
Scope#
Questions:
- q: What type of
rate limiter
? - q: What type of rules are we writing: (Number of API requests, IP, user id, other properties)?
- q: Do we need to consider distributed environments?
- q: Do we need to inform the user when they are throttled?
Requirements#
- Accurately rate limit excessive requests
- Rate limiter should not impede HTTP requests
- Distributed rate limiting–shared across multiple servers of processes
- Show clear exceptions when user requests fail
- High fault tolerance
High-level Design#
Options#
- Client-side rate limiter
- Server-side rate limiter
- Middleware rate limiter (HTTP: layer 7)
- API Gateway rate limiter (An API gateway is a middleware that supports rate limiting)
Guidelines#
- Evaluate your tech stack: (language, caching, etc). Are you able to efficiently implement a rate limiter with your tech stack?
- If you implement your own, you have more control over which algorithm you want to use. You may have less choice with a third party rate limiter
- If you have a microservice architecture that already uses an API Gateway, you may add a rate limiter to it
- Building your own rate limiter can be time consuming. If you don’t have a compelling reason to build your own, a commercially available one may be a better option
Rate Limiting Algorithms#
Token bucket#
- Used by Amazon & Stripe
- Specified
capacity
and refill rate
- Each request expends one token
- Out of budget requests overflow and get dropped
- Each API endpoint usually has its own bucket. If we want to rate limit by IP address, we would use a different bucket for each IP
- If system has a global request limit, it makes sense to use one bucket for the whole system
Pros | Cons |
---|
✅ Easy algorithm | ❌ Size & refill rate can be hard to tune |
✅ Memory Efficient | |
✅ Good for short bursts | |
Leaking bucket#
- Used by Shopify
- Similar to token bucket, but requests processed at fixed rate
- Usually a FIFO queue
- Requests added as long as there is room in the queue
- Specified
queue size
and outflow rate
Pros | Cons |
---|
✅ Memory Efficient due to limited queue | ❌ bursts of traffic can fill up the queue with old requests (blocking newer requests) |
✅ Requests processed at a fixed rate | ❌ Size & outflow rate may be difficult to tune |
Fixed window counter#
- Timeline divided into fixed-sized windows
- Each requests increments counter in window by one
- Requests dropped after limit for current time window reached
Pros | Cons |
---|
✅ Memory efficient | ❌ Traffic spikes near the window edges can allow up to twice the specified limit (if request spike straddles two windows) |
✅ Easy to understand | |
✅ Resetting quota each time period is good for some use cases | |
Sliding window log#
- Solves flaw in Fixed window counter that can allow too many requests near window edges
- Employ a sliding window that keeps track of request time stamps. (Timestamp data can be stored as a sorted set in Redis)
- Sliding window
duration
and request limit
specified - With each new request, remove outdated timestamps that fall before the sliding window
- Add new timestamp with each request
- Allow requests if
log
size (length) is less than to request limit
, otherwise drop it
Pros | Cons |
---|
✅ Very accurate. Rate Limit strictly enforced | ❌ Uses a lot of memory (timestamps stored in memory) |
Sliding window counter#
- Hybrid of
Fixed window counter
and Sliding window log
Window size
and capacity
specified- Allow if
requests in current window
+ requests in previous window
* overlapping percentage in previous window
less than to request limit
. Assuming that requests are even distributed, this works very well- Example. Let’s say that the
request limit
is 6
per second, there were 6
requests in the 10:00:00
time window and none in the current window. If the time is currently 10:01:20
, there is room for 2
more requests
Pros | Cons |
---|
✅ Smooths out spikes because it’s based on a rolling average of the previous window | ❌ Only works where approximate rate limiting is sufficient. Experiments by Cloudflare show that this method works well in practice |
✅ Memory efficient | |
High-level architecture#
Design deep dive#
Rate limiting rules#
Rules are usually written in cofig files and stored on disc.
domain: messaging
descriptors:
key: message_type
value: marketing
rate_limit:
unit: day
requests_per_day: 10
...
domain: auth
descriptors: auth_type
value: login
rate_limit:
unit: minute
requests_per_unit: 5
Exceeding rate limit#
- If a user exceeds the
rate limit
, they should get an HTTP 429
(too many requests) response. Rate limited requests may be queued up for processing in the future if the use case calls for it. - Rate limiter headers
X-Ratelimit-Remaining
X-Ratelimit-Limit
X-Ratelimit-Retry-After
Detailed design#
flowchart TD
CLIENT[Client]
RATE_LIMITER[Rate Limiter Middleware]
DROPPED(Dropped Requests)
MESSAGE_QUEUE[Message Queue]
DROPPED--"HTTP 429"-->CLIENT
MESSAGE_QUEUE--"HTTP 429"-->CLIENT
subgraph Rate Limited API
CACHED_RULES[Cached Rules]
WORKERS[Workers]
API_SERVERS[API Servers]
CACHE[Redis Cache]
RULES[Rate Limiting Rules Configs]
RULES-->WORKERS
WORKERS-->CACHED_RULES
RATE_LIMITER<-->API_SERVERS
RATE_LIMITER<-->CACHE
RATE_LIMITER-."Option 1"..->DROPPED
RATE_LIMITER-."Option 2"..->MESSAGE_QUEUE
CACHED_RULES-->RATE_LIMITER
end
Distributed rate limiting#
- Single server rate is straight forward. Distributed system rate limiting is plauged with the following challenges:
- race coditions
- sychronization issues
Mitigating race conditions#
- We could use
sorted sets
or LUA
scripts in Redis
to mitigate race conditions
. (We could either use MULTI/EXEC
or LUA
scripts) - Each user has a
sorted set
that stores the timestamps of their requests MULTI/EXEC
:- Use
ZREMRANGEBYSCORE key min max
to remove outdated scores - Get the
ZRANGE key start stop
- Add current timestamp with
ZADD
- Use TTL to save memory
- Use
MUTLI
to approximate atomicity - In this approached, blocked actions still count as actions. Users need to stop sending requests in order for their request budgets to reset
LUA
Scripts:- We could use
LUA
scripts if we wanted perform all of the rate limiting logic inside of redis to ensure that the code runs as anticipated in a multi-threaded environment. The MULTI/EXEC
commands run outside of Redis and may not function exactly as we would hope
Mitigating synchronization issues#
Monitoring#
- It is important to monitor the effectiveness or your rate limiting
algorithm
and rulesets
Source#
System Design Interview by Alex Xu