• An HTTP rate rate limiter can:
    • block requests that exceed defined thresholds
    • prevent resource starvation caused by Denial of Service DoS attacks
    • reduce costs by limiting excess requests. (Critically important for services that use paid third party services)
    • prevent server overload from bots or malicious users



  • q: What type of rate limiter?
    • a: server side
  • q: What type of rules are we writing: (Number of API requests, IP, user id, other properties)?
    • a: flexible ruleset
  • q: Do we need to consider distributed environments?
    • a: yes
  • q: Do we need to inform the user when they are throttled?
    • a: yes


  • Accurately rate limit excessive requests
  • Rate limiter should not impede HTTP requests
  • Distributed rate limiting–shared across multiple servers of processes
  • Show clear exceptions when user requests fail
  • High fault tolerance

High-level Design


  • Client-side rate limiter
  • Server-side rate limiter
  • Middleware rate limiter (HTTP: layer 7)
  • API Gateway rate limiter (An API gateway is a middleware that supports rate limiting)


  • Evaluate your tech stack: (language, caching, etc). Are you able to efficiently implement a rate limiter with your tech stack?
  • If you implement your own, you have more control over which algorithm you want to use. You may have less choice with a third party rate limiter
  • If you have a microservice architecture that already uses an API Gateway, you may add a rate limiter to it
  • Building your own rate limiter can be time consuming. If you don’t have a compelling reason to build your own, a commercially available one may be a better option

Rate Limiting Algorithms

Token bucket

  • Used by Amazon & Stripe
  • Specified capacity and refill rate
  • Each request expends one token
  • Out of budget requests overflow and get dropped
  • Each API endpoint usually has its own bucket. If we want to rate limit by IP address, we would use a different bucket for each IP
  • If system has a global request limit, it makes sense to use one bucket for the whole system
✅ Easy algorithm❌ Size & refill rate can be hard to tune
✅ Memory Efficient
✅ Good for short bursts

Leaking bucket

  • Used by Shopify
  • Similar to token bucket, but requests processed at fixed rate
  • Usually a FIFO queue
  • Requests added as long as there is room in the queue
  • Specified queue size and outflow rate
✅ Memory Efficient due to limited queue❌ bursts of traffic can fill up the queue with old requests (blocking newer requests)
✅ Requests processed at a fixed rate❌ Size & outflow rate may be difficult to tune

Fixed window counter

  • Timeline divided into fixed-sized windows
  • Each requests increments counter in window by one
  • Requests dropped after limit for current time window reached
✅ Memory efficient❌ Traffic spikes near the window edges can allow up to twice the specified limit (if request spike straddles two windows)
✅ Easy to understand
✅ Resetting quota each time period is good for some use cases

Sliding window log

  • Solves flaw in Fixed window counter that can allow too many requests near window edges
  • Employ a sliding window that keeps track of request time stamps. (Timestamp data can be stored as a sorted set in Redis)
  • Sliding window duration and request limit specified
  • With each new request, remove outdated timestamps that fall before the sliding window
  • Add new timestamp with each request
  • Allow requests if log size (length) is less than to request limit, otherwise drop it
✅ Very accurate. Rate Limit strictly enforced❌ Uses a lot of memory (timestamps stored in memory)

Sliding window counter

  • Hybrid of Fixed window counter and Sliding window log
  • Window size and capacity specified
  • Allow if requests in current window + requests in previous window * overlapping percentage in previous window less than to request limit. Assuming that requests are even distributed, this works very well
    • Example. Let’s say that the request limit is 6 per second, there were 6 requests in the 10:00:00 time window and none in the current window. If the time is currently 10:01:20, there is room for 2 more requests
✅ Smooths out spikes because it’s based on a rolling average of the previous window❌ Only works where approximate rate limiting is sufficient. Experiments by Cloudflare show that this method works well in practice
✅ Memory efficient

High-level architecture

  • Employ a counter to keep track of requests sent from individual users, IP addresses, etc. Allow requests that don’t exceed limit
  • Use an in-memory database / cache like Redis. Disc access databases are too slow
  • We need INCR and EXPIRE functionality
    • INCR: increment counter by one
    • EXPIRE: set timeout to expire counter after a set period of time
    • When the client sends a request to the server, the rate limiter get the appropriate counter value from Redis and determines whether to allow the request or not. It then updates the counter in Redis
    flowchart LR A[Client]-->B[Rate Limiter] B-->C[API Servers] B<-->D[Redis]

Design deep dive

Rate limiting rules

Rules are usually written in cofig files and stored on disc.

domain: messaging
    key: message_type
    value: marketing
        unit: day
        requests_per_day: 10


domain: auth
descriptors: auth_type
value: login
    unit: minute
    requests_per_unit: 5

Exceeding rate limit

  • If a user exceeds the rate limit, they should get an HTTP 429 (too many requests) response. Rate limited requests may be queued up for processing in the future if the use case calls for it.
  • Rate limiter headers
    • X-Ratelimit-Remaining
    • X-Ratelimit-Limit
    • X-Ratelimit-Retry-After

Detailed design


Distributed rate limiting

  • Single server rate is straight forward. Distributed system rate limiting is plauged with the following challenges:
    • race coditions
    • sychronization issues

Mitigating race conditions

  • We could use sorted sets or LUA scripts in Redis to mitigate race conditions. (We could either use MULTI/EXEC or LUA scripts)
  • Each user has a sorted set that stores the timestamps of their requests
    • Use ZREMRANGEBYSCORE key min max to remove outdated scores
    • Get the ZRANGE key start stop
    • Add current timestamp with ZADD
    • Use TTL to save memory
    • Use MUTLI to approximate atomicity
    • In this approached, blocked actions still count as actions. Users need to stop sending requests in order for their request budgets to reset
  • LUA Scripts:
    • We could use LUA scripts if we wanted perform all of the rate limiting logic inside of redis to ensure that the code runs as anticipated in a multi-threaded environment. The MULTI/EXEC commands run outside of Redis and may not function exactly as we would hope

Mitigating synchronization issues

  • To support millions of users, we need to employ multiple rate limiters that are in sync with each other.
  • Sticky sessions are not a good solution because it can’t be scaled
  • Using a centralized Redis store is more scalable and flexible. This type of solution employs the eventual consistency model
    flowchart LR A[Client A]<-->B[Rate Limiter 1] C[Client B]<-->D[Rate Limiter 2] B<-->E[Redis] D<-->E


  • It is important to monitor the effectiveness or your rate limiting algorithm and rulesets


System Design Interview by Alex Xu