Open
Conversation
- Template cnn.cu over f32/f16 types (accumulate in f32 for precision)
- Route f16 Conv ops to CUDA in transform.rs
- Dispatch to conv{N}d_f16_generic in ConvGeneric
- Add conv_f16 test suite (ConvProblemF16 wrapper)
- Register conv_f16 tests in test-cuda suite
Groups work fine with the generic CUDA conv kernel for both f32 and f16.
cuDNN returns CUDNN_STATUS_NOT_SUPPORTED for f16 3D convolutions, so we gate cuDNN f16 to hw_rank <= 2 and let higher ranks use the generic CUDA kernel.
Cast scale and bias back to the input datum type before the final mul/add so the output matches the expected f16 type instead of being promoted to f32 by mixed-type arithmetic.
Cast alpha and beta scalar constants to match the input datum type so they don't promote the output from f16 to f32.
Flow-matching transformer (12B params), CLIP-L + T5-XXL text encoders, no classifier-free guidance (distilled), 4 steps. Packing/unpacking and RoPE position IDs wrapped in the ONNX export for clean Rust interface.
…ipeline - export.py: load one component at a time in f16 (VAE in f32) to avoid OOM - reference.py: generate reference I/O bundles for tract validation - main.rs: full pipeline — tokenize, text encode, denoise (4 steps), VAE decode Models loaded/unloaded sequentially to fit in 32GB VRAM. Transformer + text encoders in f16, VAE in f32 (instance norm overflows f16).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.