Lakebase: Postgres with Parquet on S3 for LTAP Workloads

Monolithic Postgres: The Root of All Evil

Most databases today are monoliths: one machine runs both the engine and storage. Postgres, MySQL, Oracle all share this design. They use a Write Ahead Log (WAL) for fast commits and data files for fast reads. But this creates pain:

Data loss from misconfigured flushes or node failure.
Scaling reads requires cloning the entire database.
High availability needs a full physical standby.
Analytics contend with transactional traffic on the same hardware.

Databricks, the company behind Apache Spark, decided to redesign Postgres from scratch. The result is Lakebase, a serverless Postgres that externalizes the WAL and data files into independent services.

Lakebase Architecture: Stateless Compute, Durable Storage

Lakebase splits the monolith into three components:

Stateless Postgres compute – can start, stop, and scale instantly.
SafeKeeper – a distributed service that replaces the WAL on local disk. Commits are durable via Paxos replication across a quorum of nodes, not a single disk flush.
PageServer – a distributed service that stores data files in cloud object storage (e.g., S3) as Parquet. It asynchronously applies WAL from SafeKeeper to materialize pages.

Scaling Writes: SafeKeeper

Instead of flushing WAL to local disk, Lakebase replicates log records across SafeKeeper nodes using Paxos. This eliminates the risk of silent data loss from misconfigured flushes. The network hop is no worse than synchronous replication in a traditional Postgres setup. Databricks claims 5X higher write throughput and 2X lower read latency compared to monolithic Postgres with synchronous replication.

Scaling Reads: PageServer

Data files live in PageServer, which caches pages in low-cost object storage. When a compute node needs a page, it first checks its local buffer pool, then local disk cache, and only goes to PageServer on a miss. Because you can configure local memory and disk identical to a monolith, cache hit rates remain unchanged. Read latency is indistinguishable from a monolith for most operations.

What This Unlocks

Unlimited storage – data lives in object storage, not a provisioned disk.
Serverless, elastic compute – scale to zero when idle.
Durable writes – Paxos ensures no data loss on node failure.
Simpler HA – failover doesn't require promoting a physical clone.
Instant branching – cloning a database is a metadata operation, taking seconds instead of hours.

LTAP: One Copy for Transactions and Analytics

The real innovation is LTAP (Low-latency Transactions and Analytics). Because Lakebase stores operational data in open formats (Parquet) on commodity object storage, other engines can read it directly. No need for CDC pipelines or mirroring. You run OLTP and OLAP on the same data, in real time.

This is a direct response to the industry's struggle with HTAP (Hybrid Transactional/Analytical Processing) systems that required separate copies or complex synchronization. Lakebase's approach is simpler: store once, query with any engine.

Benchmarks and Practical Details

Databricks shared specific numbers:

5X higher write throughput vs. Postgres with synchronous replication.
2X lower read latency due to aggressive caching and PageServer optimizations.
Instant branching – a 1TB database can be cloned in under 2 seconds.

The system is currently available on both Databricks and Neon (which uses the same architecture). It supports all Postgres wire protocol, SQL, drivers, and extensions.

How to Use Lakebase

You don't need to migrate your entire stack. Lakebase is compatible with existing Postgres clients. Connect via psql:

psql &#34;host=your-lakebase-instance.databricks.com port=5432 dbname=mydb user=admin&#34;

Then run your usual SQL. For analytics, you can query the same data using Spark or Presto directly from S3:

SELECT * FROM parquet.`s3://your-bucket/lakebase/mydb/table`;

No ETL, no CDC. Just one copy of the data.

Why This Matters

For developers running Postgres in production, this architecture solves the long-standing tension between transactional and analytical workloads. You no longer need to maintain a separate analytics replica or pay for CDC tools. The cost savings from not duplicating storage and compute are significant. Plus, instant branching makes development workflows as fast as Git branches.

Next Steps

Try Lakebase on Databricks or Neon today. If you're tired of managing read replicas and CDC pipelines, this is the architecture you've been waiting for.

Lakebase: Postgres with Parquet on S3 for LTAP Workloads

Monolithic Postgres: The Root of All Evil

Lakebase Architecture: Stateless Compute, Durable Storage

Scaling Writes: SafeKeeper

Scaling Reads: PageServer

What This Unlocks

LTAP: One Copy for Transactions and Analytics

Benchmarks and Practical Details

How to Use Lakebase

Why This Matters

Next Steps

Editor's Take

Key Takeaways

Why It Matters

Get the weekly digest

Dan Luu: AI Coding Agents Hallucinate Bugs, Testing Still Works

MSI Center Vulnerability Grants SYSTEM Privileges via Named Pipe