Skip to content

[execute] Spawned block-processing tasks swallow panics, deadlocking the entire pipeline #295

@keanji-x

Description

@keanji-x

Summary

A panic inside Core::process() (spawned via fire-and-forget tokio::spawn at lib.rs:269-273) is silently consumed because the JoinHandle is dropped without being awaited. The panicking task never calls notify() on any of the four barriers (execute_block_barrier, merklize_barrier, seal_barrier, make_canonical_barrier), causing all subsequent blocks to hang forever.

Related sub-issues

  1. Bare .unwrap() on barrier waits (L476, L478, L496, L522): If a prior block panicked, these unwraps cascade into further panics, amplifying the deadlock across all in-flight blocks.
  2. seal_barrier not closed on shutdown (L250-254): When the ordered-block channel closes, run() closes three barriers but omits seal_barrier, leaving any task waiting on it permanently hung.

Reproduction

  1. Trigger any assert!/assert_eq! failure inside process() (e.g., epoch mismatch at L401, execute_height invariant at L461).
  2. Observe that no subsequent blocks are processed — they all hang on execute_block_barrier.wait_timeout.

Impact

  • Severity: Critical
  • Complete pipeline halt with no recovery path other than node restart.
  • Multiple production assert!/assert_eq! calls exist in non-#[cfg(debug_assertions)] paths (L401, L459, L461, L700, L778, L945), making this triggerable.

Suggested investigation areas

  • Await or JoinSet-manage the spawned tasks to propagate panics.
  • Convert production-path assert! to graceful error handling.
  • Add seal_barrier.close() to the shutdown path.

Files

  • crates/pipe-exec-layer-ext-v2/execute/src/lib.rs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions