DuckPond

DuckPond is a query-native filesystem for time-series data, built on Apache Arrow, DataFusion, and Delta Lake. Every filesystem object can be queried with SQL, and SQL queries create new filesystem objects that appear as native files and directories.

Built by the Caspar Water System.

What DuckPond Does

DuckPond gives you a transactional filesystem where files are first-class data. It has two operating modes:

Pond mode ($POND): A persistent, transactional filesystem backed by Delta Lake. Every write is atomic. Files can be raw bytes, queryable Parquet tables, or multi-version time-series. The pond replicates to S3-compatible storage for backup and cross-machine access.
Host mode (host+): A read-only view of the local filesystem where the same query and factory tools work directly on host files -- no pond initialization required.

Both modes use the same URL scheme system and the same SQL engine. A CSV file on your local disk and a time-series inside a remote pond backup are queried with the same pond cat --sql syntax.

Replication and Cross-Pond Import

Every pond can push incremental backups to S3-compatible storage (MinIO, AWS S3). From any other machine, you can:

Discover remote ponds: pond run host+remote:///config.yaml list-ponds
Browse backup contents: pond run host+remote:///config.yaml show
Import a subtree into a local pond for querying

The host+remote:// pattern lets you point at a YAML config file on your local disk and interact with a remote backup without initializing a pond first. This makes it easy to inspect what a remote machine has collected before deciding what to pull down.

Quick Start

# Build
make build

# Run unit tests
make test

# Initialize a pond and try it out
export POND=/tmp/mypond
pond init
pond mkdir /data
echo "hello" | pond copy - /data/greeting.txt
pond list /**
pond cat /data/greeting.txt

# Query a local CSV with SQL (no pond needed)
pond cat host+csv:///tmp/data.csv --format=table --sql "SELECT * FROM source"

# Run a factory from a local config file (no pond needed)
pond run host+remote:///path/to/backup-config.yaml list-ponds

Developer Guide

Prerequisites

Rust stable toolchain (see rust-toolchain.toml)
Docker (for integration tests and site deployment)
Node.js >= 22 (for browser tests and vendor download)

Daily Workflow

make build          # Build pond binary (debug)
make test           # Run all unit tests
make integration    # Build Docker test image + run integration tests
make check          # fmt + clippy + test (CI equivalent)

Run make with no arguments to see all available targets.

One-Time Setup

Download JavaScript vendor dependencies for offline site generation:

make vendor         # Downloads DuckDB-WASM, Observable Plot, D3

This populates crates/sitegen/vendor/dist/ (gitignored, ~35MB). After this, pond run sitegen build produces sites that work without network access.

Repository Structure

crates/
  tinyfs/       Pure filesystem abstractions (FS, WD, Node, path resolution)
  tlogfs/       Delta Lake persistence (OpLog, transactions, DataFusion)
  steward/      Transaction orchestration, control table, factory execution
  provider/     URL-based data access, factory registry, table providers
  cmd/          CLI commands (pond init/list/cat/copy/run/...)
  sitegen/      Static site generator (factory)
  remote/       S3 backup & replication (factory)
  hydrovu/      HydroVu API collector (factory)
  utilities/    Shared helpers (glob, chunked files, perf tracing)

scripts/        Shared deployment scripts
testsuite/      Integration tests (Docker-based)
  tests/        Individual test scripts (NNN-description.sh)
  browser/      Puppeteer browser validation tests

docs/           Architecture and design documentation
water/          Water monitoring demo site
septic/         Septic system demo site
noyo/           Noyo Harbor demo site

Architecture

See docs/duckpond-overview.md for the full architecture description. Key layers (bottom to top):

Layer	Crate	Role
Filesystem	`tinyfs`	Pure abstractions: FS, WD, Node, path resolution
Persistence	`tlogfs`	Delta Lake storage, OpLog, DataFusion integration
Orchestration	`steward`	Transactions, control table, factory lifecycle
Data Access	`provider`	URL schemes, factory registry, table providers
CLI	`cmd`	User-facing commands

CLI Reference

See docs/cli-reference.md for the complete command reference. Common commands:

pond init                           # Create a new pond
pond list '/**'                     # List all entries
pond cat /path/to/file              # Read a file
pond cat --sql "SELECT * FROM source WHERE ..." /path  # Query a table
pond copy host:///local/file /pond/path                # Import a file
pond copy host+series:///data.parquet /pond/series      # Import time-series
pond mkdir /dir                     # Create a directory
pond mknod <factory> /path --config-path config.yaml   # Install a factory
pond run /path/to/factory <command>                     # Execute a factory
pond log                            # Transaction history

Host mode (no pond required):

pond cat host+csv:///tmp/data.csv --format=table       # Query a local CSV
pond run host+remote:///config.yaml list-ponds          # Browse S3 backups
pond run host+sitegen:///site.yaml build ./dist         # Generate a site

Integration Tests

Tests live in testsuite/tests/ as numbered shell scripts. Each test runs in a fresh Docker container with the pond binary:

make test-image                     # Build the test Docker image
make integration                    # Run all tests (skips browser tests)
make integration-all                # Run all tests including browser

# Run a single test
cd testsuite && ./run-test.sh 201

# Run interactively (explore in container)
cd testsuite && ./run-test.sh --interactive

Demo Sites

Each demo site (water/, septic/, noyo/) rsyncs data from its remote machine and runs everything locally:

# First time: configure your site
cp water/deploy.env.example water/deploy.env
# Edit deploy.env with your remote host and S3 credentials

# Site workflow (all run locally)
cd water
./setup-local.sh          # rsync data + init pond + install factories
./run-local.sh            # rsync new data + ingest
./generate-local.sh       # build static site + preview
./update-local.sh         # after editing YAML/templates

Credentials are kept in deploy.env (gitignored) — never in the YAML configs checked into the repository. Remote machines use container images built by GitHub Actions.

Documentation

Document	Contents
CLI Reference	Complete command syntax and examples
Architecture Overview	System design and crate map
System Patterns	Transaction model, factories, providers
Sitegen Design	Static site generator architecture
Cross-Pond Import	Foreign pond import status
Large File Storage	Content-addressed storage for large files
Releasing	Release process and supply chain security

License

Apache-2.0 — see LICENSES/ for details.

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
.github		.github
LICENSES		LICENSES
crates		crates
cross		cross
docs		docs
linux		linux
noyo		noyo
original		original
septic		septic
testsuite		testsuite
water		water
.dockerignore		.dockerignore
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
RELEASING.md		RELEASING.md
REUSE.toml		REUSE.toml
SECURITY.md		SECURITY.md
caspar_duckpond.jpg		caspar_duckpond.jpg
catparquet.sh		catparquet.sh
clippy.toml		clippy.toml
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DuckPond

What DuckPond Does

Replication and Cross-Pond Import

Quick Start

Developer Guide

Prerequisites

Daily Workflow

One-Time Setup

Repository Structure

Architecture

CLI Reference

Integration Tests

Demo Sites

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DuckPond

What DuckPond Does

Replication and Cross-Pond Import

Quick Start

Developer Guide

Prerequisites

Daily Workflow

One-Time Setup

Repository Structure

Architecture

CLI Reference

Integration Tests

Demo Sites

Documentation

License

About

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages