Every backend service meets a client that doesn't know when to stop. Buggy retry loops, scrapers, or well-meaning cron jobs firing a thousand requests in the same second all need rate limiting.

This post covers the three algorithms—token bucket, leaky bucket, and sliding window—and how to use them in Go without writing from scratch. Go has solid, well-tested libraries for each.

The Three Algorithms in One Minute

  • Token bucket: A bucket holds tokens, added at a steady rate up to a capacity. Each request takes a token. No token, no request. Allows short bursts. Useful when you want strict average but tolerant of spikes.
  • Leaky bucket: Requests go into a bucket that drains at a steady rate. If full, requests are dropped. Enforces smooth output, no bursts.
  • Sliding window: Counts requests inside a moving time window. Most accurate but most expensive to compute.

1. Token Bucket with golang.org/x/time/rate

The golang.org/x/time/rate package is the de facto standard for token bucket rate limiting in Go. It's an official Go subrepository, same maintainers as the standard library.

go get golang.org/x/time/rate

Core type is rate.Limiter. Create with NewLimiter(r, b) where r is refill rate (tokens per second) and b is burst size (bucket capacity).

package main

import (
    "context"
    "fmt"
    "time"
    "golang.org/x/time/rate"
)

func main() {
    limiter := rate.NewLimiter(5, 10) // 5 tokens/sec, burst 10
    ctx := context.Background()
    for i := 0; i < 20; i++ {
        if err := limiter.Wait(ctx); err != nil {
            fmt.Println("error:", err)
            return
        }
        fmt.Printf("request %d at %s\n", i, time.Now().Format("15:04:05.000"))
    }
}

Three methods: Allow() (non-blocking, returns true if token available), Wait(ctx) (blocks until token), Reserve() (returns Reservation with delay).

Drop-style HTTP Middleware with Allow()

func rateLimitMiddleware(next http.Handler) http.Handler {
    limiter := rate.NewLimiter(10, 20) // 10 req/sec, burst 20
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        if !limiter.Allow() {
            w.Header().Set("Content-Type", "application/json")
            w.WriteHeader(http.StatusTooManyRequests)
            json.NewEncoder(w).Encode(map[string]string{
                "error": "rate limit exceeded, please retry later",
            })
            return
        }
        next.ServeHTTP(w, r)
    })
}

Per-client Limiting

A single global limiter is rarely what you want. Each client (IP, API key) should have its own bucket. Pattern: map of limiters keyed by client identifier.

type clientLimiter struct {
    limiter  *rate.Limiter
    lastSeen time.Time
}

type IPRateLimiter struct {
    clients map[string]*clientLimiter
    mu      sync.Mutex
    rate    rate.Limit
    burst   int
}

func NewIPRateLimiter(r rate.Limit, b int) *IPRateLimiter {
    rl := &IPRateLimiter{
        clients: make(map[string]*clientLimiter),
        rate:    r,
        burst:   b,
    }
    go rl.cleanup()
    return rl
}

func (rl *IPRateLimiter) getLimiter(ip string) *rate.Limiter {
    rl.mu.Lock()
    defer rl.mu.Unlock()
    c, ok := rl.clients[ip]
    if !ok {
        lim := rate.NewLimiter(rl.rate, rl.burst)
        rl.clients[ip] = &clientLimiter{limiter: lim, lastSeen: time.Now()}
        return lim
    }
    c.lastSeen = time.Now()
    return c.limiter
}

func (rl *IPRateLimiter) cleanup() {
    for {
        time.Sleep(time.Minute)
        rl.mu.Lock()
        for ip, c := range rl.clients {
            if time.Since(c.lastSeen) > 3*time.Minute {
                delete(rl.clients, ip)
            }
        }
        rl.mu.Unlock()
    }
}

Key details: janitor goroutine to evict idle clients, use net.SplitHostPort to strip port from r.RemoteAddr, and in production behind a load balancer, extract client IP from X-Forwarded-For.

Wait-style: Throttling Outbound Calls

When you're the client hitting an upstream API with a 100 req/sec ceiling, use Wait.

func fetchAll(ctx context.Context, urls []string) {
    limiter := rate.NewLimiter(rate.Every(10*time.Millisecond), 1)
    var wg sync.WaitGroup
    for _, u := range urls {
        if err := limiter.Wait(ctx); err != nil {
            break
        }
        wg.Add(1)
        go func(u string) {
            defer wg.Done()
            fetch(u)
        }(u)
    }
    wg.Wait()
}

Burst of 1 means first call returns immediately, rest spaced ~10ms apart. Wait must happen before goroutine spawn, not inside it, to actually throttle the calls. rate.Every(d) converts interval to rate.Limit.

2. Leaky Bucket with go.uber.org/ratelimit

Uber's go.uber.org/ratelimit is the simplest leaky-bucket implementation. API: one method Take().

go get go.uber.org/ratelimit
package main

import (
    "fmt"
    "time"
    "go.uber.org/ratelimit"
)

func main() {
    rl := ratelimit.New(100) // 100 ops/sec, evenly spaced
    prev := time.Now()
    for i := 0; i < 10; i++ {
        now := rl.Take()
        fmt.Println(i, now.Sub(prev))
        prev = now
    }
}

Every iteration after the first prints ~10ms. No bursts.

Slack: Controlled Burstiness

Slack allows a small number of unspent requests from idle periods to be used in a burst later. It only accumulates during idle time, not like token bucket's continuous refill.

rl := ratelimit.New(100)                        // default up to 10 slack
rl := ratelimit.New(100, ratelimit.WithoutSlack) // strict
rl := ratelimit.New(100, ratelimit.WithSlack(50)) // custom slack

Note: WithoutSlack is a variable, not a function—no parentheses.

Per-minute and Other Windows

Use Per:

rl := ratelimit.New(5, ratelimit.Per(time.Minute)) // 5 per minute

When to Pick Uber's Library

Only when you specifically need leaky-bucket semantics. x/time/rate can do almost everything and more (context support, Allow/Reserve, dynamic rates). But if you need evenly spaced output, ratelimit.New(n) is one line. Caveat: library is stable but not actively iterated.

3. Sliding Window: Exact "N per Minute"

Token bucket and leaky bucket give average-rate enforcement, not exact "no more than N requests in any sliding window." Sliding window is the most accurate but expensive. The source article mentions sliding window but does not provide code; it's noted as a future topic. For precise sliding window, you'd typically use Redis sorted sets or a sliding window counter with time buckets.

Summary

AlgorithmAllows Bursts?Output ShapeTypical Use
Token bucketYesBursty within limitsAPI endpoints, user quotas
Leaky bucketNoPerfectly smoothOutbound calls to strict upstream
Sliding windowConfigurableAccurate over time"N per minute" billing-style limits

Use golang.org/x/time/rate for most cases; reach for go.uber.org/ratelimit when you need strict spacing. For sliding window, consider Redis-based implementations.

Next steps: Implement per-client rate limiting in your HTTP service using the IPRateLimiter pattern above. If you need distributed rate limiting, that's a follow-up topic.