fix(trt): explicit TRT context/engine teardown + VRAM pre-check to prevent CUDA OOM on reload#2
Open
livepeer-tessa wants to merge 1 commit intomainfrom
Open
fix(trt): explicit TRT context/engine teardown + VRAM pre-check to prevent CUDA OOM on reload#2livepeer-tessa wants to merge 1 commit intomainfrom
livepeer-tessa wants to merge 1 commit intomainfrom
Conversation
…t CUDA OOM on reload Fixes #723 — streamdiffusion-sdxl CUDA OOM on pipeline reload. Root causes: 1. cleanup_gpu_memory() called Engine.__del__() manually, which is unreliable. CPython may defer destructor invocation, leaving TensorRT execution contexts and ICudaEngine objects alive (and their GPU memory pinned) past the call. 2. Engine.__del__() did 'del self.engine' without first setting self.context = None, violating TRT's required teardown order (context must be destroyed before engine). 3. No VRAM guard before reload — OOM occurred mid-load with no early diagnostic. Changes: - utilities.py / Engine.__del__: set self.context = None then self.engine = None before the 'del' statements so the C++ destructors fire in the correct order. - wrapper.py / _destroy_trt_engine (new static helper): explicit per-attribute nullification of Engine.context, Engine.engine, and buffer freeing; replaces the fragile manual __del__() call. - wrapper.py / cleanup_gpu_memory (rewrite): uses _destroy_trt_engine on every TRT wrapper (UNet, VAE encoder/decoder, ControlNet pool); calls gc.collect() between context and engine deletion; reports non-PyTorch residual VRAM so operators can detect incomplete TRT teardown. - wrapper.py / _load_model: adds VRAM pre-flight check after cleanup — raises RuntimeError with actionable message if free VRAM < 2 GB, preventing the process from entering a slow OOM crash loop (and hitting the 3-restart limit). Signed-off-by: Tessa (livepeer-tessa) <tessa@livepeer.org> Signed-off-by: livepeer-tessa <livepeer-tessa@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #723 —
streamdiffusion-sdxlworkers enter ERROR state after pipeline param updates due to CUDA OOM.Root Causes
Two bugs conspired to leave ~22 GB of GPU memory pinned after
cleanup_gpu_memory():1. Incorrect teardown order in
Engine.__del__(utilities.py)**TensorRT requires the execution context to be destroyed before the engine. The old code did
del self.engine/del self.context(Python reference deletion, no ordering guarantee). Settingself.context = Nonefirst forces the C++IExecutionContextdestructor to run beforeICudaEngineis released.2. Manual
__del__()call is unreliable (wrapper.py)**cleanup_gpu_memory()calledunet_engine.engine.__del__()explicitly. Python's destructor protocol doesn't guarantee immediate native teardown when called this way — the object can still be alive in the GC graph and TRT CUDA memory stays pinned.Changes
src/streamdiffusion/acceleration/tensorrt/utilities.pyEngine.__del__: setself.context = Nonethenself.engine = Nonebefore thedelstatements to ensure the C++ destructors fire in the correct order.src/streamdiffusion/wrapper.py_destroy_trt_engine: explicitly nullifiescontext→engine→ frees buffers on anyEnginewrapper. Replaces the fragile manual__del__()call.cleanup_gpu_memoryrewrite: uses_destroy_trt_engineon UNet, VAE encoder/decoder, and ControlNet engines; callsgc.collect()between stages; logs non-PyTorch residual VRAM so operators can spot incomplete teardown._load_model: after cleanup, checkstorch.cuda.mem_get_info(). If free VRAM < 2 GB, raisesRuntimeErrorwith an actionable message instead of letting the process OOM mid-load and exhaust the 3-restart budget.Testing
Cannot be tested locally without a 24 GB GPU + TRT engine build. Tested via log inspection and code review against the crash trace in #723.
Checklist:
ast.parse)except: passthat would hide new failures