-
Notifications
You must be signed in to change notification settings - Fork 0
Core: Replay existing WAL/Parquet files on startup from foreign data dir #130
Description
Problem
When allsource-core starts with DATA_DIR pointing at a directory containing existing Parquet files and WAL logs (written by an embedded Rust allsource-core client), it initializes fresh with 0 events instead of discovering and replaying the existing data.
docker run --rm --platform linux/amd64 \
-v "/path/to/app/allsource:/data" \
-e DATA_DIR=/data \
-p 3900:3900 \
ghcr.io/all-source-os/allsource-core:latestHealth endpoint shows:
{"total_events": 0, "tenant_events": 0}Despite the data dir containing 400+ parquet files and an active WAL.
Expected behavior
Core should discover and replay existing Parquet + WAL data on startup, making all events queryable via the HTTP/WS API. This enables using Core as a read-only query layer over data written by embedded clients (e.g., Tauri desktop apps using allsource-core crate directly).
Context
Applications like Longhand embed allsource-core as a Rust crate and write events directly to WAL/Parquet. For debugging and MCP integration, we want to point the Core container at this data to query it — but Core doesn't pick up "foreign" data that wasn't written through its own bootstrap/tenant system.
Possible approaches
- Auto-discover tenant data — scan
storage/for parquet files not associated with a system stream and create implicit tenant mappings --import-dirflag — explicit CLI flag to replay a foreign data directory into Core's store- Read-only mode —
CORE_READ_ONLY=truethat skips bootstrap and just opens existing WAL + Parquet for queries