What is Streambed?

Streambed is an open-source Change Data Capture (CDC) engine that streams PostgreSQL WAL (Write-Ahead Log) changes to Apache Iceberg tables stored on S3-compatible object storage. It writes Parquet files, commits Iceberg metadata, and includes a built-in query server that speaks the Postgres wire protocol — so you can connect with psql or any Postgres client.

The project, created by viggy28, targets teams that want to offload analytical queries from their production Postgres database without changing their application code. No ETL pipelines, no Spark clusters — just Postgres and S3.

How It Works

Streambed connects to Postgres as a logical replication subscriber. It decodes WAL messages (inserts, updates, deletes), buffers rows per table, and periodically flushes them as Parquet files to S3 with Iceberg metadata commits. Updates and deletes use copy-on-write merging against existing Parquet data.

Postgres WAL ──▶ Decode ──▶ Buffer ──▶ Parquet ──▶ S3 ──▶ Iceberg Commit
               │
               DuckDB ◀──┘ (query server)

A query server exposes Iceberg tables over the Postgres wire protocol using embedded DuckDB, so you can query with psql or any Postgres client.

Quick Start

# Start Postgres + MinIO locally
docker compose up -d

# Build
Go 1.22+ and CGO required.
go build -o streambed ./cmd/streambed

# Start syncing + query server on :5433
./streambed sync \
  --source-url="postgres://postgres:test@localhost:5432/postgres" \
  --s3-bucket="streambed" \
  --s3-endpoint="http://localhost:9000" \
  --s3-prefix="test" \
  --query-addr=:5433

# Query your Postgres tables via Iceberg
psql -h localhost -p 5433 -U postgres -d postgres

All flags support environment variables with STREAMBED_ prefix (e.g., STREAMBED_SOURCE_URL).

Commands

CommandWhat it does
streambed syncMain daemon. Streams WAL, writes Iceberg, optionally serves queries.
streambed resync --table=public.usersOne-shot backfill via COPY under a consistent snapshot.
streambed queryStandalone query server (no sync). Points at existing Iceberg tables.
streambed cleanup --table=public.usersDeletes S3 objects and state for a table. Useful before resync.

Technical Details

  • Language: Go 1.22+, with CGO dependencies for go-duckdb and go-sqlite3.
  • Integration tests: Run against Postgres (port 5434) and MinIO (port 9002) via Docker Compose. Use ./scripts/test-integration.sh.
  • Performance: The README shows a comparison on pgbench with 1M accounts and 500K history rows — Postgres on the left, Streambed on the right. (Exact numbers not published, but the claim is that analytical queries are faster on the Iceberg side.)

Why It Matters

Modern applications often use Postgres for both OLTP and OLAP, which leads to performance degradation under heavy analytical queries. Streambed provides a lightweight alternative to heavy ETL tools like Debezium + Kafka + Spark. It's a single Go binary that speaks Postgres wire protocol, so you can keep using existing tools.

Developer Insights

  1. Use streambed resync for initial backfill of existing tables under a consistent snapshot, then let sync handle ongoing changes.
  2. The query server uses embedded DuckDB — expect excellent analytical performance on Iceberg files, but no write support.
  3. For production, ensure your S3 bucket has versioning or lifecycle policies to manage Parquet file accumulation.

Tags

postgres, iceberg, cdc, s3, parquet, duckdb, open-source, developer-tools

Category

developer-tools

Quality Score

75