Epoll vs. io_uring: Why Linux Async I/O Changed Forever

The Syscall Tax That Epoll Can't Escape

Epoll notifies your app when I/O is possible. You still must call read() or write() to actually move data. That's two syscalls per I/O event (epoll_wait + read/write), plus a one-time epoll_ctl registration. Each syscall triggers a context switch between user and kernel mode — a huge overhead when handling thousands of connections.

Io_uring flips the model: it notifies you when I/O is done. The kernel and your app share a memory region with two ring buffers (submission and completion). You post an operation to the submission queue, the kernel processes it, and writes the result to the completion queue. Instead of a syscall pair per I/O, you get one io_uring_enter() call per batch — or, with IORING_SETUP_SQPOLL, close to none during steady state.

Code Comparison: Epoll vs. Io_uring

Epoll (readiness model)

#include 
#include 

int epoll_fd = epoll_create1(0);
struct epoll_event ev = {.events = EPOLLIN, .data.fd = STDIN_FILENO};
epoll_ctl(epoll_fd, EPOLL_CTL_ADD, STDIN_FILENO, &amp;ev);

struct epoll_event events[1];
epoll_wait(epoll_fd, events, 1, -1);  // syscall #1
char buf[1024];
read(STDIN_FILENO, buf, sizeof(buf)); // syscall #2

Three syscalls total: epoll_ctl (one-time), epoll_wait, and read. Each I/O operation costs two syscalls.

Io_uring (completion model)

#include

struct io_uring ring; io_uring_queue_init(32, &ring, 0);

struct io_uring_sqe *sqe = io_uring_get_sqe(&ring); io_uring_prep_read(sqe, STDIN_FILENO, buf, sizeof(buf), 0); io_uring_submit(&ring); // one syscall for submission

struct io_uring_cqe *cqe; io_uring_wait_cqe(&ring, &cqe); // one syscall for completion // cqe->res contains bytes read io_uring_cqe_seen(&ring, cqe);


No separate registration — the ring is set up once. The `io_uring_submit()` and `io_uring_wait_cqe()` each may call `io_uring_enter()` once, but one call can submit a batch of operations and reap many completions. With SQPOLL, even those calls disappear during steady state.

## When Io_uring Shines

- **Zero-copy I/O**: Register buffers with `io_uring_register_buffers()` to avoid kernel memory remapping. For network sends, use `IORING_OP_SEND_ZC` (kernel 6.0+) to skip copying the buffer into kernel space entirely.
- **Batch processing**: One `io_uring_enter()` can submit dozens of reads and collect their results, while epoll requires a syscall pair per operation.
- **Lower latency**: Completion model eliminates the polling loop and reduces context switches.

## The SQPOLL Caveat

`IORING_SETUP_SQPOLL` spins a kernel thread that polls the submission queue. When idle, it backs off after `sq_thread_idle` microseconds, but it still burns CPU even with an empty queue. Not free — use only if you have sustained I/O.

## Why You Should Care

Io_uring landed in Linux 5.1 (2019). If your servers run kernels newer than that, there&#39;s little reason to use epoll for new projects. The TinyGate rewrite showed a dramatic performance boost — though still not beating nginx/haproxy, the architectural advantages are clear. For from-scratch projects, io_uring is the way to go.

## What to Do Now

1. Check your kernel version: `uname -r`. If &gt;= 5.1, you can use io_uring.
2. Install liburing (`liburing-dev` on Debian/Ubuntu, `liburing-devel` on Fedora).
3. Rewrite your I/O loop using the completion model. Start with `io_uring_queue_init()` and replace epoll_wait/read pairs with submission/completion batches.
4. For maximum performance, register buffers and use `IORING_OP_SEND_ZC` for network sends (kernel 6.0+).

Epoll vs. io_uring: Why Linux Async I/O Changed Forever

The Syscall Tax That Epoll Can't Escape

Code Comparison: Epoll vs. Io_uring

Epoll (readiness model)

Io_uring (completion model)

Editor's Take

Key Takeaways

Why It Matters

Get the weekly digest

You might also like

Bun's WebKit PR Adds Shared-Memory Threads to JavaScriptCore

PostgresBench: Reproducible Benchmark for Managed Postgres

The Inspection Paradox: Why Your Users See Slower Latency Than You

Backup and Restore Coolify in 12 Minutes: S3 + APP_KEY Guide

PostgresBench: Reproducible Benchmark for Managed Postgres

The Inspection Paradox: Why Your Users See Slower Latency Than You