fix(trt): explicit TRT context/engine teardown + VRAM pre-check to prevent CUDA OOM on reload by livepeer-tessa · Pull Request #2 · daydreamlive/StreamDiffusion

livepeer-tessa · 2026-03-20T18:35:24Z

Summary

Fixes #723 — streamdiffusion-sdxl workers enter ERROR state after pipeline param updates due to CUDA OOM.

Root Causes

Two bugs conspired to leave ~22 GB of GPU memory pinned after cleanup_gpu_memory():

1. Incorrect teardown order in Engine.__del__ (utilities.py)**

TensorRT requires the execution context to be destroyed before the engine. The old code did del self.engine / del self.context (Python reference deletion, no ordering guarantee). Setting self.context = None first forces the C++ IExecutionContext destructor to run before ICudaEngine is released.

2. Manual __del__() call is unreliable (wrapper.py)**

cleanup_gpu_memory() called unet_engine.engine.__del__() explicitly. Python's destructor protocol doesn't guarantee immediate native teardown when called this way — the object can still be alive in the GC graph and TRT CUDA memory stays pinned.

Changes

`src/streamdiffusion/acceleration/tensorrt/utilities.py`

Engine.__del__: set self.context = None then self.engine = None before the del statements to ensure the C++ destructors fire in the correct order.

`src/streamdiffusion/wrapper.py`

New static helper _destroy_trt_engine: explicitly nullifies context → engine → frees buffers on any Engine wrapper. Replaces the fragile manual __del__() call.
cleanup_gpu_memory rewrite: uses _destroy_trt_engine on UNet, VAE encoder/decoder, and ControlNet engines; calls gc.collect() between stages; logs non-PyTorch residual VRAM so operators can spot incomplete teardown.
VRAM pre-flight check in _load_model: after cleanup, checks torch.cuda.mem_get_info(). If free VRAM < 2 GB, raises RuntimeError with an actionable message instead of letting the process OOM mid-load and exhaust the 3-restart budget.

Testing

Cannot be tested locally without a 24 GB GPU + TRT engine build. Tested via log inspection and code review against the crash trace in #723.

Checklist:

Both changed files parse cleanly (ast.parse)
Teardown order matches TRT documentation (context before engine)
No silent except: pass that would hide new failures
VRAM check raises with an actionable message rather than logging and continuing

…t CUDA OOM on reload Fixes #723 — streamdiffusion-sdxl CUDA OOM on pipeline reload. Root causes: 1. cleanup_gpu_memory() called Engine.__del__() manually, which is unreliable. CPython may defer destructor invocation, leaving TensorRT execution contexts and ICudaEngine objects alive (and their GPU memory pinned) past the call. 2. Engine.__del__() did 'del self.engine' without first setting self.context = None, violating TRT's required teardown order (context must be destroyed before engine). 3. No VRAM guard before reload — OOM occurred mid-load with no early diagnostic. Changes: - utilities.py / Engine.__del__: set self.context = None then self.engine = None before the 'del' statements so the C++ destructors fire in the correct order. - wrapper.py / _destroy_trt_engine (new static helper): explicit per-attribute nullification of Engine.context, Engine.engine, and buffer freeing; replaces the fragile manual __del__() call. - wrapper.py / cleanup_gpu_memory (rewrite): uses _destroy_trt_engine on every TRT wrapper (UNet, VAE encoder/decoder, ControlNet pool); calls gc.collect() between context and engine deletion; reports non-PyTorch residual VRAM so operators can detect incomplete TRT teardown. - wrapper.py / _load_model: adds VRAM pre-flight check after cleanup — raises RuntimeError with actionable message if free VRAM < 2 GB, preventing the process from entering a slow OOM crash loop (and hitting the 3-restart limit). Signed-off-by: Tessa (livepeer-tessa) <tessa@livepeer.org> Signed-off-by: livepeer-tessa <livepeer-tessa@users.noreply.github.com>

livepeer-tessa mentioned this pull request Mar 20, 2026

streamdiffusion-sdxl: CUDA OOM on pipeline reload crashes worker after 3 restarts — GPU memory not fully freed between loads daydreamlive/scope#723

Open

livepeer-tessa requested review from emranemran and mjh1 March 20, 2026 18:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(trt): explicit TRT context/engine teardown + VRAM pre-check to prevent CUDA OOM on reload#2

fix(trt): explicit TRT context/engine teardown + VRAM pre-check to prevent CUDA OOM on reload#2
livepeer-tessa wants to merge 1 commit intomainfrom
fix/cuda-oom-tensorrt-memory-leak

livepeer-tessa commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

livepeer-tessa commented Mar 20, 2026

Summary

Root Causes

Changes

src/streamdiffusion/acceleration/tensorrt/utilities.py

src/streamdiffusion/wrapper.py

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`src/streamdiffusion/acceleration/tensorrt/utilities.py`

`src/streamdiffusion/wrapper.py`