Skip to content

Cuda Graphs! #34

@haroonsyed

Description

@haroonsyed

So I have been in the process of implementing "packed" operations in order to hide the overhead of launching a bunch of small kernels.
I just learned there is actually a solution to address this, cuda graphs!

I believe packed operations will still be faster if each operation is so small that multiple could be run in parallel on the gpu (I have no looked into graphs too much, it may allow nodes at the same level to launch in parallel which would be awesome). But this would only be a constant factor of the number of possible warps that could be launched (probably not more than 30x). However, I was noticing much worse slowdowns with increasing number of kernel launches because there was more and more accumulated overhead.

In other words, scaling for example the number of layers in a CNN was not resulting in constant time slowdowns.

I am going to continue along the path of using packed operations, since I believe it will be a valuable experience. But afterwards I do want to take a look at graphs.

https://developer.nvidia.com/blog/cuda-graphs/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions