Skip to content
#

nsight-compute

Here are 12 public repositories matching this topic...

GPU-accelerated Number-Theoretic Transform for ZK-Proof generation. Targets the NTT bottleneck (91% of Groth16 prover time) via two CUDA optimizations: async double-buffered pipeline eliminating CPU-GPU transfer overhead, and IADD3-path Montgomery multiplication reducing finite-field instruction latency. BLS12-381, Ampere sm_86, Nsight-profiled.

  • Updated Mar 16, 2026
  • Cuda

Improve this page

Add a description, image, and links to the nsight-compute topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the nsight-compute topic, visit your repo's landing page and select "manage topics."

Learn more