What is Streambed?
Streambed is an open-source Change Data Capture (CDC) engine that streams PostgreSQL WAL (Write-Ahead Log) changes to Apache Iceberg tables stored on S3-compatible object storage. It writes Parquet files, commits Iceberg metadata, and includes a built-in query server that speaks the Postgres wire protocol — so you can connect with psql or any Postgres client.
The project, created by viggy28, targets teams that want to offload analytical queries from their production Postgres database without changing their application code. No ETL pipelines, no Spark clusters — just Postgres and S3.
How It Works
Streambed connects to Postgres as a logical replication subscriber. It decodes WAL messages (inserts, updates, deletes), buffers rows per table, and periodically flushes them as Parquet files to S3 with Iceberg metadata commits. Updates and deletes use copy-on-write merging against existing Parquet data.
Postgres WAL ──▶ Decode ──▶ Buffer ──▶ Parquet ──▶ S3 ──▶ Iceberg Commit
│
DuckDB ◀──┘ (query server)
A query server exposes Iceberg tables over the Postgres wire protocol using embedded DuckDB, so you can query with psql or any Postgres client.
Quick Start
# Start Postgres + MinIO locally
docker compose up -d
# Build
Go 1.22+ and CGO required.
go build -o streambed ./cmd/streambed
# Start syncing + query server on :5433
./streambed sync \
--source-url="postgres://postgres:test@localhost:5432/postgres" \
--s3-bucket="streambed" \
--s3-endpoint="http://localhost:9000" \
--s3-prefix="test" \
--query-addr=:5433
# Query your Postgres tables via Iceberg
psql -h localhost -p 5433 -U postgres -d postgres
All flags support environment variables with STREAMBED_ prefix (e.g., STREAMBED_SOURCE_URL).
Commands
| Command | What it does |
|---|---|
streambed sync | Main daemon. Streams WAL, writes Iceberg, optionally serves queries. |
streambed resync --table=public.users | One-shot backfill via COPY under a consistent snapshot. |
streambed query | Standalone query server (no sync). Points at existing Iceberg tables. |
streambed cleanup --table=public.users | Deletes S3 objects and state for a table. Useful before resync. |
Technical Details
- Language: Go 1.22+, with CGO dependencies for
go-duckdbandgo-sqlite3. - Integration tests: Run against Postgres (port 5434) and MinIO (port 9002) via Docker Compose. Use
./scripts/test-integration.sh. - Performance: The README shows a comparison on
pgbenchwith 1M accounts and 500K history rows — Postgres on the left, Streambed on the right. (Exact numbers not published, but the claim is that analytical queries are faster on the Iceberg side.)
Why It Matters
Modern applications often use Postgres for both OLTP and OLAP, which leads to performance degradation under heavy analytical queries. Streambed provides a lightweight alternative to heavy ETL tools like Debezium + Kafka + Spark. It's a single Go binary that speaks Postgres wire protocol, so you can keep using existing tools.
Developer Insights
- Use
streambed resyncfor initial backfill of existing tables under a consistent snapshot, then letsynchandle ongoing changes. - The query server uses embedded DuckDB — expect excellent analytical performance on Iceberg files, but no write support.
- For production, ensure your S3 bucket has versioning or lifecycle policies to manage Parquet file accumulation.
Tags
postgres, iceberg, cdc, s3, parquet, duckdb, open-source, developer-tools
Category
developer-tools
Quality Score
75




