Every backend service meets a client that doesn't know when to stop. Buggy retry loops, scrapers, or well-meaning cron jobs firing a thousand requests in the same second all need rate limiting.
This post covers the three algorithms—token bucket, leaky bucket, and sliding window—and how to use them in Go without writing from scratch. Go has solid, well-tested libraries for each.
The Three Algorithms in One Minute
- Token bucket: A bucket holds tokens, added at a steady rate up to a capacity. Each request takes a token. No token, no request. Allows short bursts. Useful when you want strict average but tolerant of spikes.
- Leaky bucket: Requests go into a bucket that drains at a steady rate. If full, requests are dropped. Enforces smooth output, no bursts.
- Sliding window: Counts requests inside a moving time window. Most accurate but most expensive to compute.
1. Token Bucket with golang.org/x/time/rate
The golang.org/x/time/rate package is the de facto standard for token bucket rate limiting in Go. It's an official Go subrepository, same maintainers as the standard library.
go get golang.org/x/time/rate
Core type is rate.Limiter. Create with NewLimiter(r, b) where r is refill rate (tokens per second) and b is burst size (bucket capacity).
package main
import (
"context"
"fmt"
"time"
"golang.org/x/time/rate"
)
func main() {
limiter := rate.NewLimiter(5, 10) // 5 tokens/sec, burst 10
ctx := context.Background()
for i := 0; i < 20; i++ {
if err := limiter.Wait(ctx); err != nil {
fmt.Println("error:", err)
return
}
fmt.Printf("request %d at %s\n", i, time.Now().Format("15:04:05.000"))
}
}
Three methods: Allow() (non-blocking, returns true if token available), Wait(ctx) (blocks until token), Reserve() (returns Reservation with delay).
Drop-style HTTP Middleware with Allow()
func rateLimitMiddleware(next http.Handler) http.Handler {
limiter := rate.NewLimiter(10, 20) // 10 req/sec, burst 20
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if !limiter.Allow() {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusTooManyRequests)
json.NewEncoder(w).Encode(map[string]string{
"error": "rate limit exceeded, please retry later",
})
return
}
next.ServeHTTP(w, r)
})
}
Per-client Limiting
A single global limiter is rarely what you want. Each client (IP, API key) should have its own bucket. Pattern: map of limiters keyed by client identifier.
type clientLimiter struct {
limiter *rate.Limiter
lastSeen time.Time
}
type IPRateLimiter struct {
clients map[string]*clientLimiter
mu sync.Mutex
rate rate.Limit
burst int
}
func NewIPRateLimiter(r rate.Limit, b int) *IPRateLimiter {
rl := &IPRateLimiter{
clients: make(map[string]*clientLimiter),
rate: r,
burst: b,
}
go rl.cleanup()
return rl
}
func (rl *IPRateLimiter) getLimiter(ip string) *rate.Limiter {
rl.mu.Lock()
defer rl.mu.Unlock()
c, ok := rl.clients[ip]
if !ok {
lim := rate.NewLimiter(rl.rate, rl.burst)
rl.clients[ip] = &clientLimiter{limiter: lim, lastSeen: time.Now()}
return lim
}
c.lastSeen = time.Now()
return c.limiter
}
func (rl *IPRateLimiter) cleanup() {
for {
time.Sleep(time.Minute)
rl.mu.Lock()
for ip, c := range rl.clients {
if time.Since(c.lastSeen) > 3*time.Minute {
delete(rl.clients, ip)
}
}
rl.mu.Unlock()
}
}
Key details: janitor goroutine to evict idle clients, use net.SplitHostPort to strip port from r.RemoteAddr, and in production behind a load balancer, extract client IP from X-Forwarded-For.
Wait-style: Throttling Outbound Calls
When you're the client hitting an upstream API with a 100 req/sec ceiling, use Wait.
func fetchAll(ctx context.Context, urls []string) {
limiter := rate.NewLimiter(rate.Every(10*time.Millisecond), 1)
var wg sync.WaitGroup
for _, u := range urls {
if err := limiter.Wait(ctx); err != nil {
break
}
wg.Add(1)
go func(u string) {
defer wg.Done()
fetch(u)
}(u)
}
wg.Wait()
}
Burst of 1 means first call returns immediately, rest spaced ~10ms apart. Wait must happen before goroutine spawn, not inside it, to actually throttle the calls. rate.Every(d) converts interval to rate.Limit.
2. Leaky Bucket with go.uber.org/ratelimit
Uber's go.uber.org/ratelimit is the simplest leaky-bucket implementation. API: one method Take().
go get go.uber.org/ratelimit
package main
import (
"fmt"
"time"
"go.uber.org/ratelimit"
)
func main() {
rl := ratelimit.New(100) // 100 ops/sec, evenly spaced
prev := time.Now()
for i := 0; i < 10; i++ {
now := rl.Take()
fmt.Println(i, now.Sub(prev))
prev = now
}
}
Every iteration after the first prints ~10ms. No bursts.
Slack: Controlled Burstiness
Slack allows a small number of unspent requests from idle periods to be used in a burst later. It only accumulates during idle time, not like token bucket's continuous refill.
rl := ratelimit.New(100) // default up to 10 slack
rl := ratelimit.New(100, ratelimit.WithoutSlack) // strict
rl := ratelimit.New(100, ratelimit.WithSlack(50)) // custom slack
Note: WithoutSlack is a variable, not a function—no parentheses.
Per-minute and Other Windows
Use Per:
rl := ratelimit.New(5, ratelimit.Per(time.Minute)) // 5 per minute
When to Pick Uber's Library
Only when you specifically need leaky-bucket semantics. x/time/rate can do almost everything and more (context support, Allow/Reserve, dynamic rates). But if you need evenly spaced output, ratelimit.New(n) is one line. Caveat: library is stable but not actively iterated.
3. Sliding Window: Exact "N per Minute"
Token bucket and leaky bucket give average-rate enforcement, not exact "no more than N requests in any sliding window." Sliding window is the most accurate but expensive. The source article mentions sliding window but does not provide code; it's noted as a future topic. For precise sliding window, you'd typically use Redis sorted sets or a sliding window counter with time buckets.
Summary
| Algorithm | Allows Bursts? | Output Shape | Typical Use |
|---|---|---|---|
| Token bucket | Yes | Bursty within limits | API endpoints, user quotas |
| Leaky bucket | No | Perfectly smooth | Outbound calls to strict upstream |
| Sliding window | Configurable | Accurate over time | "N per minute" billing-style limits |
Use golang.org/x/time/rate for most cases; reach for go.uber.org/ratelimit when you need strict spacing. For sliding window, consider Redis-based implementations.
Next steps: Implement per-client rate limiting in your HTTP service using the IPRateLimiter pattern above. If you need distributed rate limiting, that's a follow-up topic.



