Go Rate Limiting: Token Bucket, Leaky Bucket, Sliding Window

Every backend service meets a client that doesn't know when to stop. Buggy retry loops, scrapers, or well-meaning cron jobs firing a thousand requests in the same second all need rate limiting.

This post covers the three algorithms—token bucket, leaky bucket, and sliding window—and how to use them in Go without writing from scratch. Go has solid, well-tested libraries for each.

The Three Algorithms in One Minute

Token bucket: A bucket holds tokens, added at a steady rate up to a capacity. Each request takes a token. No token, no request. Allows short bursts. Useful when you want strict average but tolerant of spikes.
Leaky bucket: Requests go into a bucket that drains at a steady rate. If full, requests are dropped. Enforces smooth output, no bursts.
Sliding window: Counts requests inside a moving time window. Most accurate but most expensive to compute.

1. Token Bucket with golang.org/x/time/rate

The golang.org/x/time/rate package is the de facto standard for token bucket rate limiting in Go. It's an official Go subrepository, same maintainers as the standard library.

go get golang.org/x/time/rate

Core type is rate.Limiter. Create with NewLimiter(r, b) where r is refill rate (tokens per second) and b is burst size (bucket capacity).

package main

import (
    &#34;context&#34;
    &#34;fmt&#34;
    &#34;time&#34;
    &#34;golang.org/x/time/rate&#34;
)

func main() {
    limiter := rate.NewLimiter(5, 10) // 5 tokens/sec, burst 10
    ctx := context.Background()
    for i := 0; i &lt; 20; i++ {
        if err := limiter.Wait(ctx); err != nil {
            fmt.Println(&#34;error:&#34;, err)
            return
        }
        fmt.Printf(&#34;request %d at %s\n&#34;, i, time.Now().Format(&#34;15:04:05.000&#34;))
    }
}

Three methods: Allow() (non-blocking, returns true if token available), Wait(ctx) (blocks until token), Reserve() (returns Reservation with delay).

Drop-style HTTP Middleware with Allow()

func rateLimitMiddleware(next http.Handler) http.Handler {
    limiter := rate.NewLimiter(10, 20) // 10 req/sec, burst 20
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        if !limiter.Allow() {
            w.Header().Set(&#34;Content-Type&#34;, &#34;application/json&#34;)
            w.WriteHeader(http.StatusTooManyRequests)
            json.NewEncoder(w).Encode(map[string]string{
                &#34;error&#34;: &#34;rate limit exceeded, please retry later&#34;,
            })
            return
        }
        next.ServeHTTP(w, r)
    })
}

Per-client Limiting

A single global limiter is rarely what you want. Each client (IP, API key) should have its own bucket. Pattern: map of limiters keyed by client identifier.

type clientLimiter struct {
    limiter  *rate.Limiter
    lastSeen time.Time
}

type IPRateLimiter struct {
    clients map[string]*clientLimiter
    mu      sync.Mutex
    rate    rate.Limit
    burst   int
}

func NewIPRateLimiter(r rate.Limit, b int) *IPRateLimiter {
    rl := &amp;IPRateLimiter{
        clients: make(map[string]*clientLimiter),
        rate:    r,
        burst:   b,
    }
    go rl.cleanup()
    return rl
}

func (rl *IPRateLimiter) getLimiter(ip string) *rate.Limiter {
    rl.mu.Lock()
    defer rl.mu.Unlock()
    c, ok := rl.clients[ip]
    if !ok {
        lim := rate.NewLimiter(rl.rate, rl.burst)
        rl.clients[ip] = &amp;clientLimiter{limiter: lim, lastSeen: time.Now()}
        return lim
    }
    c.lastSeen = time.Now()
    return c.limiter
}

func (rl *IPRateLimiter) cleanup() {
    for {
        time.Sleep(time.Minute)
        rl.mu.Lock()
        for ip, c := range rl.clients {
            if time.Since(c.lastSeen) &gt; 3*time.Minute {
                delete(rl.clients, ip)
            }
        }
        rl.mu.Unlock()
    }
}

Key details: janitor goroutine to evict idle clients, use net.SplitHostPort to strip port from r.RemoteAddr, and in production behind a load balancer, extract client IP from X-Forwarded-For.

Wait-style: Throttling Outbound Calls

When you're the client hitting an upstream API with a 100 req/sec ceiling, use Wait.

func fetchAll(ctx context.Context, urls []string) {
    limiter := rate.NewLimiter(rate.Every(10*time.Millisecond), 1)
    var wg sync.WaitGroup
    for _, u := range urls {
        if err := limiter.Wait(ctx); err != nil {
            break
        }
        wg.Add(1)
        go func(u string) {
            defer wg.Done()
            fetch(u)
        }(u)
    }
    wg.Wait()
}

Burst of 1 means first call returns immediately, rest spaced ~10ms apart. Wait must happen before goroutine spawn, not inside it, to actually throttle the calls. rate.Every(d) converts interval to rate.Limit.

2. Leaky Bucket with go.uber.org/ratelimit

Uber's go.uber.org/ratelimit is the simplest leaky-bucket implementation. API: one method Take().

go get go.uber.org/ratelimit

package main

import (
    &#34;fmt&#34;
    &#34;time&#34;
    &#34;go.uber.org/ratelimit&#34;
)

func main() {
    rl := ratelimit.New(100) // 100 ops/sec, evenly spaced
    prev := time.Now()
    for i := 0; i &lt; 10; i++ {
        now := rl.Take()
        fmt.Println(i, now.Sub(prev))
        prev = now
    }
}

Every iteration after the first prints ~10ms. No bursts.

Slack: Controlled Burstiness

Slack allows a small number of unspent requests from idle periods to be used in a burst later. It only accumulates during idle time, not like token bucket's continuous refill.

rl := ratelimit.New(100)                        // default up to 10 slack
rl := ratelimit.New(100, ratelimit.WithoutSlack) // strict
rl := ratelimit.New(100, ratelimit.WithSlack(50)) // custom slack

Note: WithoutSlack is a variable, not a function—no parentheses.

Per-minute and Other Windows

Use Per:

rl := ratelimit.New(5, ratelimit.Per(time.Minute)) // 5 per minute

When to Pick Uber's Library

Only when you specifically need leaky-bucket semantics. x/time/rate can do almost everything and more (context support, Allow/Reserve, dynamic rates). But if you need evenly spaced output, ratelimit.New(n) is one line. Caveat: library is stable but not actively iterated.

3. Sliding Window: Exact "N per Minute"

Token bucket and leaky bucket give average-rate enforcement, not exact "no more than N requests in any sliding window." Sliding window is the most accurate but expensive. The source article mentions sliding window but does not provide code; it's noted as a future topic. For precise sliding window, you'd typically use Redis sorted sets or a sliding window counter with time buckets.

Summary

Algorithm	Allows Bursts?	Output Shape	Typical Use
Token bucket	Yes	Bursty within limits	API endpoints, user quotas
Leaky bucket	No	Perfectly smooth	Outbound calls to strict upstream
Sliding window	Configurable	Accurate over time	"N per minute" billing-style limits

Use golang.org/x/time/rate for most cases; reach for go.uber.org/ratelimit when you need strict spacing. For sliding window, consider Redis-based implementations.

Next steps: Implement per-client rate limiting in your HTTP service using the IPRateLimiter pattern above. If you need distributed rate limiting, that's a follow-up topic.

Go Rate Limiting: Token Bucket, Leaky Bucket, Sliding Window

The Three Algorithms in One Minute

1. Token Bucket with golang.org/x/time/rate

Drop-style HTTP Middleware with Allow()

Per-client Limiting

Wait-style: Throttling Outbound Calls

2. Leaky Bucket with go.uber.org/ratelimit

Slack: Controlled Burstiness

Per-minute and Other Windows

When to Pick Uber's Library

3. Sliding Window: Exact "N per Minute"

Summary

Editor's Take

Key Takeaways

Why It Matters

Get the weekly digest

You might also like

Block PR merges on vulnerabilities with Synapse security gate

DocuBrowse 0.9.0: Local AI Document Search with SQLite FTS5 and Ollama

DKIM2 and DMARCbis Land: Email Auth Gets a Chain of Custody

Truecaller vs TRAI: Spam labeling ban hurts call blocking

Next.js 16 Optimistic UI: The Rapid-Click Bug That Breaks Your Toggle

Rate Limiting by IP Broke My API: Fixing Shared Provider Quotas