The Go Function That Kept Dying
A Go function with a tight timeout kept getting cancelled in production. It ran fine in dev, CI, and integration tests. In production, it would blow past the timeout and die with context deadline exceeded. Worse, the state machine library it used would crash and hang when its context got cancelled.
Every CPU graph looked fine. Users reported low CPU utilization. It took weeks to find the cause.
Why Average CPU Isn’t Enough
The instinct to check average CPU is taught by every tool, every vendor, every dashboard. But CPU utilization is not linear to capacity. The jump from 80% to 81% adds roughly 20× more wait time than from 10% to 11%, according to the M/M/1 queueing model. At 80% utilization, a 10 ms request waits ~50 ms. At 95%, it waits ~200 ms.
But in this case, CPU wasn't even high. The problem was a Linux kernel feature: CFS throttling.
How CFS Throttling Actually Works
When you set resources.limits.cpu: 2000m on a container, you don't limit it to two CPU cores. You give it a time budget: 200 ms of CPU time per 100 ms scheduling period (the CFS period, default 100 ms). The container can use that budget across all host cores.
Consider an HTTP service with a 2000m limit on a 4-core node. A bursty request can consume the entire 200 ms budget across all 4 cores in 50 ms of wall clock. The next request arriving within that period is throttled—it must wait 50 ms for the next period. If the load pattern is burst-idle-idle-idle-burst, average CPU looks healthy (e.g., 800m out of 2000m limit = 40%), but p99 latency skyrockets.
This is exactly what happened to the Go function. A different goroutine had burst through the budget, starving the function until its context timed out.
How to Spot Throttling
The metric has been there all along in /sys/fs/cgroup/cpu.stat. Run:
kubectl exec -- cat /sys/fs/cgroup/cpu.stat
Output:
usage_usec 49823715
user_usec 41205893
system_usec 8617822
nr_periods 300
nr_throttled 30
throttled_usec 6142188
nr_bursts 0
burst_usec 0
If nr_throttled and throttled_usec are climbing, you have throttling.
Kube-prometheus ships a CPUThrottlingHigh alert, but most installations disable it because it fires too often. For containers on dedicated cores, cpu.stat is your check. The same directory also exposes cpu.pressure—a kernel PSI signal that catches saturation even under quota.
For VMs in a hypervisor, cgroup files don't show hypervisor-level throttling. Instead, look for steal time (%st in top). Steal time occurs when the hypervisor gives your vCPU's slot to another tenant.
The Real Fix: Application-Level Starvation Detection
The immediate workaround is monitoring cpu.stat. But the longer-term answer is application-side detection: the application asks if a millisecond is still a millisecond. If not, it has been starved of CPU.
Redpanda calls this a "reactor stall." CockroachDB built a feedback controller around Go's /sched/latencies:seconds histogram, treating p99 latency above 1 ms as the trigger to shed background work.
Go 1.25 made GOMAXPROCS cgroup-aware by default, preventing a Go app from using more threads than the cgroup allows. This reduces starvation chances but doesn't help when sibling processes in the same container burn the shared budget. Application-side detection remains the universal answer.
What to Watch Instead of Average CPU
- cgroup throttling:
nr_throttledandthrottled_usecincpu.stat - kernel PSI:
cpu.pressurefor cgroup-native saturation - hypervisor steal time:
%stintop - application-level starvation signals: e.g., Go's
/sched/latencies:seconds
Use these together to see what the average CPU graph hides.
Why This Matters
In large enterprises, IT departments often point at a 40% CPU graph and deny requests for more resources. They follow compliance guides that require resource limits. But the graph lies. The next time you're told "CPU is fine," check cpu.stat. If throttling is climbing, you have your answer.
Send them this article.


