Skip to content

Core: Replay existing WAL/Parquet files on startup from foreign data dir #130

@decebal

Description

@decebal

Problem

When allsource-core starts with DATA_DIR pointing at a directory containing existing Parquet files and WAL logs (written by an embedded Rust allsource-core client), it initializes fresh with 0 events instead of discovering and replaying the existing data.

docker run --rm --platform linux/amd64 \
  -v "/path/to/app/allsource:/data" \
  -e DATA_DIR=/data \
  -p 3900:3900 \
  ghcr.io/all-source-os/allsource-core:latest

Health endpoint shows:

{"total_events": 0, "tenant_events": 0}

Despite the data dir containing 400+ parquet files and an active WAL.

Expected behavior

Core should discover and replay existing Parquet + WAL data on startup, making all events queryable via the HTTP/WS API. This enables using Core as a read-only query layer over data written by embedded clients (e.g., Tauri desktop apps using allsource-core crate directly).

Context

Applications like Longhand embed allsource-core as a Rust crate and write events directly to WAL/Parquet. For debugging and MCP integration, we want to point the Core container at this data to query it — but Core doesn't pick up "foreign" data that wasn't written through its own bootstrap/tenant system.

Possible approaches

  1. Auto-discover tenant data — scan storage/ for parquet files not associated with a system stream and create implicit tenant mappings
  2. --import-dir flag — explicit CLI flag to replay a foreign data directory into Core's store
  3. Read-only modeCORE_READ_ONLY=true that skips bootstrap and just opens existing WAL + Parquet for queries

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions