LLMs Expose Cracks in 20-Year-Old Web Architecture

The Old Assumption Is Breaking

For twenty years, web architecture has relied on a simple division: stateless compute servers and stateful databases. Load balancers send any request to any server. The database is the single source of truth. This works fine for short, deterministic requests. LLMs and agents shatter that assumption.

Three Ways LLMs Break the Model

Long-running work: An agent performing a 10-minute task isn't a request-response cycle. It's a long-running async process. HTTP wasn't designed for that.
Stateful compute: An agent accumulates context over multiple turns and tool calls. That context isn't database state; it's the agent's memory. Moving it to a database adds latency and complexity.
Bi-directional interaction: Users want to watch the agent think, interrupt, and redirect. That's a conversation with a process, not a query to a stateless API.

Durable Execution: Half a Solution

Frameworks like Temporal, Inngest, and Restate make processes durable and resilient. They solve the execution part. But they still pretend the process is stateless underneath. The client can't talk directly to the running process. So everyone falls back to polling.

The Routing Problem

HTTP + load balancer + stateless server can't route to a specific process. It can only route to a database. When a client needs to communicate with a durable execution workflow, the only option is to poll a query endpoint that reads from a database. This reintroduces all the problems of polling: latency trade-offs, database load, wasted requests, and poor UX for streaming.

> "Polling treats your database as a message bus. Which is what folks did before actual message buses existed."

The Missing Primitive: A Routable Transport Name

We need a transport that addresses a process, not a server. The ideal primitive: a durable, addressable channel that both client and process connect to by name. Pub/sub channels fit this role.

WebSockets Fall Short

WebSockets create a direct connection between client and server. But that connection is an address. If it drops, the address is lost. You can't reconnect to the same process because you don't have a stable identifier.

Pub/Sub Channels as the Solution

Pub/sub inverts ownership. Neither the process nor the client is addressable; the channel is. Both connect to the channel by name. The channel is durable — you can disconnect and reconnect without losing data or the ability to route to the same process.

Concrete Example: Temporal + Pub/Sub

Consider a Temporal workflow that generates a report. The workflow ID is known to both the client and the workflow. Instead of polling a database, both connect to a pub/sub channel named after the workflow ID. The workflow publishes progress updates; the client listens and can send interrupt signals. If the client disconnects, it reconnects to the same channel and picks up where it left off. No polling, no wasted tokens.

# Pseudo-code for client connecting to a durable channel
channel = connect_to_pubsub(f&#34;workflow-{workflow_id}&#34;)
channel.subscribe(on_update)
channel.send({&#34;action&#34;: &#34;cancel&#34;})  # interrupt the workflow

Why LLMs Make This Urgent

LLM responses are non-deterministic and expensive. If a client disconnects during an LLM call, retrying wastes tokens and may produce a different response. Polling every token through a database is absurdly costly. LLMs expose the limitations of the old architecture because the trade-offs become painfully visible.

> "By being non-deterministic and expensive, LLMs make the limitations of our current architecture more visible, and make the trade-offs of HTTP + stateless server + loadbalancer + database more painful."

The New Stack: Durable Execution + Pub/Sub + Stateless HTTP

The author proposes a three-layer architecture:

Durable execution for resilience and long-running processes.
Pub/sub channels for routing and bi-directional communication.
Stateless HTTP for traditional request-response where appropriate.

Each layer handles its own job. The routing primitive (pub/sub) is the missing piece that makes the whole thing work.

What This Means for Developers

If you're building agentic applications, you need to rethink your architecture. The stateless web isn't wrong for everything, but it's wrong for stateful, interactive, long-running processes. Start experimenting with durable execution frameworks and pub/sub systems (like Redis Pub/Sub, NATS, or Apache Kafka) to build a routing layer for your agents. The industry is moving this way — don't wait for a standard to emerge.

LLMs Expose Cracks in 20-Year-Old Web Architecture

The Old Assumption Is Breaking

Three Ways LLMs Break the Model

Durable Execution: Half a Solution

The Routing Problem

The Missing Primitive: A Routable Transport Name

WebSockets Fall Short

Pub/Sub Channels as the Solution

Concrete Example: Temporal + Pub/Sub

Why LLMs Make This Urgent

The New Stack: Durable Execution + Pub/Sub + Stateless HTTP

What This Means for Developers

Editor's Take

Key Takeaways

Why It Matters

Get the weekly digest

Bounded Cognition: Why Your 4-Slot Mind Shapes Software Engineering

Building a Brainrot Art Installation on Orange Pi Zero: Full-Stack Optimization Deep Dive