Skip to content

feat(pipeline): persist DAG snapshot as JSON at run start (PLT-1161)#127

Open
kurodo3[bot] wants to merge 5 commits intodevfrom
plt-1161/dag-snapshot-at-run-start
Open

feat(pipeline): persist DAG snapshot as JSON at run start (PLT-1161)#127
kurodo3[bot] wants to merge 5 commits intodevfrom
plt-1161/dag-snapshot-at-run-start

Conversation

@kurodo3
Copy link
Copy Markdown

@kurodo3 kurodo3 bot commented Apr 3, 2026

Summary

  • Extends Pipeline.save() with run_id and snapshot_time keyword-only parameters, populating the previously-null fields in the JSON output (backward compatible — default to None)
  • Adds Pipeline._write_dag_snapshot(run_id, snapshot_time) which derives the canonical path {db_root}/{pipeline_name}/dag_snapshot.json from the scoped pipeline database and writes a level="standard" snapshot; silently skips for non-local (cloud, in-memory) databases
  • Updates Pipeline.run() to generate a run_id (UUID4) and snapshot_time (ISO UTC) at run start, call _write_dag_snapshot() before any node executes, and thread run_id through to all orchestrator calls so observer.on_run_start() receives the same ID as the snapshot

What this enables

Portolan and other log consumers can read dag_snapshot.json from a predictable path to reconstruct the exact DAG structure (nodes, edges, types, content hashes) for any run — even if the run crashed mid-execution.

Test plan

  • TestSaveRunIdAndSnapshotTimesave() populates the new fields; null by default (backward compatible)
  • TestWriteDagSnapshot_write_dag_snapshot() creates the file at the correct path with correct content; returns None for in-memory/cloud databases
  • TestRunWritesDagSnapshotrun() writes snapshot before node execution; run_id in snapshot matches observer; second run overwrites with new run_id; full node-type coverage (source, operator, function)

All 19 new tests pass. Full test suite: 3136 passed, 56 skipped.

Closes PLT-1161

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 3, 2026

Codecov Report

❌ Patch coverage is 89.28571% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/orcapod/pipeline/graph.py 89.28% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants