Design a Rate Limiter

Overview

An HTTP rate rate limiter can:
- block requests that exceed defined thresholds
- prevent resource starvation caused by Denial of Service DoS attacks
- reduce costs by limiting excess requests. (Critically important for services that use paid third party services)
- prevent server overload from bots or malicious users

Scope

Questions:

q: What type of rate limiter?
- a: server side
q: What type of rules are we writing: (Number of API requests, IP, user id, other properties)?
- a: flexible ruleset
q: Do we need to consider distributed environments?
- a: yes
q: Do we need to inform the user when they are throttled?
- a: yes

Requirements

Accurately rate limit excessive requests
Rate limiter should not impede HTTP requests
Distributed rate limiting–shared across multiple servers of processes
Show clear exceptions when user requests fail
High fault tolerance

High-level Design

Options

Client-side rate limiter
Server-side rate limiter
Middleware rate limiter (HTTP: layer 7)
API Gateway rate limiter (An API gateway is a middleware that supports rate limiting)

Guidelines

Evaluate your tech stack: (language, caching, etc). Are you able to efficiently implement a rate limiter with your tech stack?
If you implement your own, you have more control over which algorithm you want to use. You may have less choice with a third party rate limiter
If you have a microservice architecture that already uses an API Gateway, you may add a rate limiter to it
Building your own rate limiter can be time consuming. If you don’t have a compelling reason to build your own, a commercially available one may be a better option

Rate Limiting Algorithms

Token bucket

Used by Amazon & Stripe
Specified capacity and refill rate
Each request expends one token
Out of budget requests overflow and get dropped
Each API endpoint usually has its own bucket. If we want to rate limit by IP address, we would use a different bucket for each IP
If system has a global request limit, it makes sense to use one bucket for the whole system

Pros	Cons
✅ Easy algorithm	❌ Size & refill rate can be hard to tune
✅ Memory Efficient
✅ Good for short bursts

Leaking bucket

Used by Shopify
Similar to token bucket, but requests processed at fixed rate
Usually a FIFO queue
Requests added as long as there is room in the queue
Specified queue size and outflow rate

Pros	Cons
✅ Memory Efficient due to limited queue	❌ bursts of traffic can fill up the queue with old requests (blocking newer requests)
✅ Requests processed at a fixed rate	❌ Size & outflow rate may be difficult to tune

Fixed window counter

Timeline divided into fixed-sized windows
Each requests increments counter in window by one
Requests dropped after limit for current time window reached

Pros	Cons
✅ Memory efficient	❌ Traffic spikes near the window edges can allow up to twice the specified limit (if request spike straddles two windows)
✅ Easy to understand
✅ Resetting quota each time period is good for some use cases

Sliding window log

Solves flaw in Fixed window counter that can allow too many requests near window edges
Employ a sliding window that keeps track of request time stamps. (Timestamp data can be stored as a sorted set in Redis)
Sliding window duration and request limit specified
With each new request, remove outdated timestamps that fall before the sliding window
Add new timestamp with each request
Allow requests if log size (length) is less than to request limit, otherwise drop it

Pros	Cons
✅ Very accurate. Rate Limit strictly enforced	❌ Uses a lot of memory (timestamps stored in memory)

Sliding window counter

Hybrid of Fixed window counter and Sliding window log
Window size and capacity specified
Allow if requests in current window + requests in previous window * overlapping percentage in previous window less than to request limit. Assuming that requests are even distributed, this works very well
- Example. Let’s say that the request limit is 6 per second, there were 6 requests in the 10:00:00 time window and none in the current window. If the time is currently 10:01:20, there is room for 2 more requests

Pros	Cons
✅ Smooths out spikes because it’s based on a rolling average of the previous window	❌ Only works where approximate rate limiting is sufficient. Experiments by Cloudflare show that this method works well in practice
✅ Memory efficient

High-level architecture

Employ a counter to keep track of requests sent from individual users, IP addresses, etc. Allow requests that don’t exceed limit
Use an in-memory database / cache like Redis. Disc access databases are too slow
We need INCR and EXPIRE functionality
- INCR: increment counter by one
- EXPIRE: set timeout to expire counter after a set period of time
- When the client sends a request to the server, the rate limiter get the appropriate counter value from Redis and determines whether to allow the request or not. It then updates the counter in Redis
flowchart LR A[Client]-->B[Rate Limiter] B-->C[API Servers] B<-->D[Redis]

Design deep dive

Rate limiting rules

Rules are usually written in cofig files and stored on disc.

domain: messaging
descriptors:
    key: message_type
    value: marketing
    rate_limit: 
        unit: day
        requests_per_day: 10

...

domain: auth
descriptors: auth_type
value: login
rate_limit:
    unit: minute
    requests_per_unit: 5

Exceeding rate limit

If a user exceeds the rate limit, they should get an HTTP 429 (too many requests) response. Rate limited requests may be queued up for processing in the future if the use case calls for it.
Rate limiter headers
- X-Ratelimit-Remaining
- X-Ratelimit-Limit
- X-Ratelimit-Retry-After

Detailed design

flowchart TD CLIENT[Client] RATE_LIMITER[Rate Limiter Middleware] DROPPED(Dropped Requests) MESSAGE_QUEUE[Message Queue] DROPPED--"HTTP 429"-->CLIENT MESSAGE_QUEUE--"HTTP 429"-->CLIENT subgraph Rate Limited API CACHED_RULES[Cached Rules] WORKERS[Workers] API_SERVERS[API Servers] CACHE[Redis Cache] RULES[Rate Limiting Rules Configs] RULES-->WORKERS WORKERS-->CACHED_RULES RATE_LIMITER<-->API_SERVERS RATE_LIMITER<-->CACHE RATE_LIMITER-."Option 1"..->DROPPED RATE_LIMITER-."Option 2"..->MESSAGE_QUEUE CACHED_RULES-->RATE_LIMITER end

Distributed rate limiting

Single server rate is straight forward. Distributed system rate limiting is plauged with the following challenges:
- race coditions
- sychronization issues

Mitigating race conditions

We could use sorted sets or LUA scripts in Redis to mitigate race conditions. (We could either use MULTI/EXEC or LUA scripts)
Each user has a sorted set that stores the timestamps of their requests
MULTI/EXEC:
- Use ZREMRANGEBYSCORE key min max to remove outdated scores
- Get the ZRANGE key start stop
- Add current timestamp with ZADD
- Use TTL to save memory
- Use MUTLI to approximate atomicity
- In this approached, blocked actions still count as actions. Users need to stop sending requests in order for their request budgets to reset
LUA Scripts:
- We could use LUA scripts if we wanted perform all of the rate limiting logic inside of redis to ensure that the code runs as anticipated in a multi-threaded environment. The MULTI/EXEC commands run outside of Redis and may not function exactly as we would hope

Mitigating synchronization issues

To support millions of users, we need to employ multiple rate limiters that are in sync with each other.
Sticky sessions are not a good solution because it can’t be scaled
Using a centralized Redis store is more scalable and flexible. This type of solution employs the eventual consistency model
flowchart LR A[Client A]<-->B[Rate Limiter 1] C[Client B]<-->D[Rate Limiter 2] B<-->E[Redis] D<-->E

Monitoring

It is important to monitor the effectiveness or your rate limiting algorithm and rulesets

Source

System Design Interview by Alex Xu

Overview#

Scope#

Requirements#

High-level Design#

Options#

Guidelines#

Rate Limiting Algorithms#

Token bucket#

Leaking bucket#

Fixed window counter#

Sliding window log#

Sliding window counter#

High-level architecture#

Design deep dive#

Rate limiting rules#

Exceeding rate limit#

Detailed design#

Distributed rate limiting#

Mitigating race conditions#

Mitigating synchronization issues#

Monitoring#

Source#