CUDA编程练习项目-Hands-on CUDA kernels and performance optimization, covering GEMM, FlashAttention, Tensor Cores, CUTLASS, quantization, KV cache, NCCL, and profiling.
-
Updated
Mar 17, 2026 - Cuda
CUDA编程练习项目-Hands-on CUDA kernels and performance optimization, covering GEMM, FlashAttention, Tensor Cores, CUTLASS, quantization, KV cache, NCCL, and profiling.
This is my 🔥 100 Days of GPU — a wild, hands-on journey through CUDA/CUTLASS kernels, Triton spells, and PTX sorcery.
Profiling with NVIDIA Nsight Tools Bootcamp
References content from the OLCF CUDA Training Series. (https://github.com/olcf/cuda-training-series)
CUDA Samples and Nsight Guided Profiling Samples
University Project for "Computer Architecture" course (MSc Computer Engineering @ University of Pisa). Implementation of a Parallelized Nearest Neighbor Upscaler using CUDA.
This project demonstrates the integration of a CUDA kernel within an NVIDIA Holoscan application. It consists of two custom operators: one for memory allocation and data initialization, and another for executing the CUDA kernel. The application was profiled using Nsight systems and the kernel with Nsight compute
Julia tools for NVIDIA Nsight Compute
A comprehensive, hardware-agnostic GPU benchmarking suite that compares CUDA, OpenCL, and DirectCompute performance using identical workloads. Built from scratch with professional architecture, extensive documentation, and production-ready GUI.
GPU-accelerated Number-Theoretic Transform for ZK-Proof generation. Targets the NTT bottleneck (91% of Groth16 prover time) via two CUDA optimizations: async double-buffered pipeline eliminating CPU-GPU transfer overhead, and IADD3-path Montgomery multiplication reducing finite-field instruction latency. BLS12-381, Ampere sm_86, Nsight-profiled.
libHPC
Roofline profiling for Deep Learning models
Add a description, image, and links to the nsight-compute topic page so that developers can more easily learn about it.
To associate your repository with the nsight-compute topic, visit your repo's landing page and select "manage topics."