Skip to content

electric-sql/ds-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

252 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ds-bench — a benchmarking system for durable streams

A reproducible, single-node benchmark harness for durable-stream servers: declarative workload suites, a Kubernetes client fleet, and exact cross-fleet HDR-percentile merging, runnable on a local kind cluster or on GKE. Workloads are server-agnostic and run against any supported implementation.

Currently supported implementations: durable-streams (Rust), the Node.js reference server (@durable-streams/server), ursula, and S2 (s2lite).

Results from a run across them: results/REPORT.md.

How it works

A suite is a JSON file (suites/*.json) that declares the workload, the systems and configs, and the sweep. scripts/bench brings up a cluster, deploys each server fresh, drives a Kubernetes client fleet, merges per-pod HDR histograms into fleet-wide percentiles, records per-cell results under results/<suite>/, and tears down. Reports regenerate from local results, no cluster required.

Each of the four workloads is its own declarative suite:

  • Write throughputappend/s at saturation, plus tail latency and pod memorysuites/run-{durable,ursula,s2,node}.json. Drives concurrent appends across many streams while the client fleet ramps a per-cardinality pod ladder; once server throughput stops climbing it pins the load and confirms the peak append rate, the latency at that peak, and the server's peak pod memory. This saturation walk finds the server's ceiling rather than assuming a pod count. The memory figure is the pod cgroup working set (anon plus active page cache), so a resident cache and an OS-paging design are compared on equal terms across every implementation.
  • Sustained loadlatency and server-memory stability over timesuites/sustained.json. Holds a fixed, modest append rate across a set of streams for a long window and watches whether latency and the server's resident memory stay flat, surfacing slow drift or leaks that a short burst would miss.
  • Catch-up / reconnectper-client catch-up latency and body sizesuites/catchup-{durable,ursula,s2}.json. Pre-populates a stream, then has many clients reconnect and replay it from the beginning all at once, recording how long each client takes to catch up, how large its response is, and the aggregate replay throughput. Each implementation replays through whatever native read path it offers, whether a snapshot plus the tail since that snapshot or a full scan of the log.
  • SSE fan-outper-event delivery latency and memory vs subscriber countscripts/run-sse.sh. Has a single writer publish to one stream while a growing number of subscribers stream it, measuring the per-event end-to-end delivery latency as the fan-out widens.

Prerequisites

  • kubectl, python3 (3.x, stdlib only), Docker.
  • Local: kind.
  • Remote: gcloud authenticated; an Artifact Registry repo. Override PROJECT (defaults to gcloud config get-value project), AR_LOCATION (default europe-west1), AR_REPO (default ds-bench), ZONE, and the machine types (SERVER_MACHINE, CLIENT_MACHINE) for your environment.
  • The durable-streams server source checked out alongside this repo, only if you build its image yourself. ursula and S2 use upstream-published images (ghcr.io/tonbo-io/ursula, ghcr.io/s2-streamstore/s2), so there is no source to vendor.

Quick start (local kind)

DS_TARGET=local scripts/cluster-up.sh     # kind cluster + MinIO + metrics ConfigMap
DS_TARGET=local scripts/build-images.sh   # build server + ds-bench images, load into kind

# `*-local` suites use small ladders/counts that fit a single kind node:
DS_TARGET=local scripts/bench suites/write-throughput-local.json run     # run a workload
DS_TARGET=local scripts/bench suites/catchup-local.json run              # another workload
scripts/bench suites/write-throughput-local.json report                  # (re)generate its report

DS_TARGET=local scripts/cluster-down.sh   # tear down

DS_TARGET=local runs everything against the kind cluster, with no cloud and no teardown of kind. The full suites/*.json are sized for a multi-node GKE cluster; for kind, use the *-local suites (or copy one and shrink stream_counts / pod_ladder / clients). run writes results/<suite>/, and re-running resumes, skipping finished cells.

Remote (GKE)

PROJECT=my-project scripts/build-images.sh        # Cloud Build → Artifact Registry
PROJECT=my-project scripts/bench suites/run-durable.json run   # one system
PROJECT=my-project scripts/run-matrix.sh                       # all systems, ≤3 GKE clusters in parallel

scripts/cluster-up.sh (invoked by bench) creates a server node pool (one node, server CPU-pinned) and a Spot client pool. See scripts/target-env.sh for all overridable env (registry, zone, machine types, pull policy). Remote clusters are billable: they tear down on clean completion, and scripts/teardown-watchdog.sh is a deadline safety net.

Supported implementations

A workload is server-agnostic: the same suite runs against any supported implementation, chosen by the suite's modes and the server image that gets deployed. Every server points at the same single-node MinIO, and only the system under test is running while it is measured.

  • durable-streams (Rust) — the server this harness was built alongside. Runs WAL-backed (--durability wal, a sharded committer, with or without the resident tail cache) or without a WAL (--durability memory); suites/run-durable.json runs the wal / wal-tailcache / memory variants side by side.
  • Node.js reference server (@durable-streams/server) — the protocol's reference implementation. It shares the wire protocol, so it reuses the durable API style and runs as mode node (in-memory storage). Being TypeScript rather than a compiled binary, its image is built from the ../durable-streams monorepo (pnpm workspace, started under Node) — see dockerfiles/durable-node.Dockerfile; build-images.sh builds it by default (BUILD_NODE=0 to skip).
  • ursula — a single-node Raft server. The storage backend is chosen at deploy time via URSULA_WAL (memory keeps the log in RAM; disk, the default, writes a WAL and fsyncs on every commit), so suites/run-ursula.json covers both.
  • S2 (s2lite) — object-store-backed.

Adding another implementation comes down to a deployment manifest, a ds-bench API style for its wire protocol, and a few addressing lines in deploy_mode and reset_state in scripts/lib-bench.sh.

Results & reproducing

  • Run the write-throughput suites individually (scripts/bench suites/run-<system>.json run) or all at once with scripts/run-matrix.sh (≤3 GKE clusters in parallel).
  • Raw data: each run writes its per-cell data under results/<suite>/ — the cells.json result-and-resume store, the merged HDR histograms, and the sidecar samples.csv.
  • Published dataset: the report and curated data for the run in this repo live in results/.
  • Regenerate reports (purely from local files, no cluster): scripts/bench suites/<suite>.json report for most workloads, python3 scripts/catchup_report.py suites/catchup-*.json for catch-up, and scripts/run-sse.sh for SSE.

Tests

Framework logic is unit-tested, no cluster required:

cd scripts && for t in *_test.py; do python3 "$t"; done
for t in scripts/*_test.sh; do bash "$t"; done

These cover the suite loader, the per-cell result stores, the saturation classifier, the catch-up and sustained runners, and the report renderers.

Fairness & disclosure

These benchmarks target single-node deployments. Within that scope, every run is kept on equal footing — one node per server, identical workload parameters, a shared single-node MinIO, fresh data each run, and only the system under test running while it is measured — and all numbers are generated by ds-bench on equal hardware, not reused from any implementation's published results.

Future work

Extend the harness to replicated and other deployment topologies; the current workloads and suites assume a single node.

Acknowledgements

The benchmark methodology is based on ursula's published benchmark (ursula.tonbo.io/benchmark); ds-bench is derived from its ursula-bench (Apache-2.0).

About

A benchmark suite for comparing durable stream server implementations based on https://ursula.tonbo.io/benchmark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors