gpu-benchmarking

Here are 7 public repositories matching this topic...

Cre4T3Tiv3 / jetson-orin-matmul-analysis

CUDA matrix multiplication benchmarking on Jetson Orin Nano. Four implementations, three power modes, five matrix sizes. 99.5% mathematical validation. C++/CUDA and Python.

Updated Mar 23, 2026
Python

kevinbazira / llm-rocm-benchmarks

Star

Standalone LLM inference benchmarking pipelines on AMD GPUs using ROCm, vLLM, MAD, and data visualization scripts.

performance-engineering machine-learning rocm amd-gpu mlops inference-optimization llm vllm llm-inference llm-benchmarking gpu-benchmarking

Updated Feb 21, 2026
Python

ZrobMiloudaa / jetson-orin-matmul-analysis

Star

🔍 Analyze CUDA matrix multiplication performance and power consumption on NVIDIA Jetson Orin Nano across multiple implementations and settings.

machine-learning robotics cuda cublas matrix-multiplication high-performance-computing gpu-computing performance-optimization autonomous-systems edge-computing nvidia-jetson embeded-systems tensor-cores ml-deployment jetson-orin-nano gpu-benchmarking power-efficiency-benchmark cuda-optimization

Updated Mar 23, 2026
Python

tdiprima / run_system_checks

Star

One-shot script to audit GPU, CUDA, PyTorch, CPU, and disk performance before debugging a slow or broken ML environment.

machine-learning cuda pytorch system-diagnostics gpu-benchmarking

Updated Mar 3, 2026
Shell

Tennisee-data / benchHUB

Star

benchHUB is a Python-based project to parse, aggregate, and visualize system and performance benchmarks. It includes a Streamlit dashboard to display and compare results.

mac benchmarking data-science machine-learning hardware leaderboard gpu-computing leaderboards performance-testing gpu-benchmark fastapi streamlit benchmarking-utility apple-silicon cpu-benchmarks gpu-benchmarking

Updated Jan 27, 2026
Python

FluidNumerics / gpu-microbenchmarks

Star

gpu benchmarks gpu-benchmarking

Updated Jan 19, 2022
C++

kadamrahul18 / GPT2-Optimization

Star

GPT-2 (124M) fixed-work distributed training benchmark on NYU BigPurple (Slurm) scaling 1→8× V100 across 2 nodes using DeepSpeed ZeRO-1 + FP16/AMP. Built a reproducible harness that writes training_metrics.json + RUN_COMPLETE.txt + launcher metadata per run, plus NCCL topology/log artifacts and Nsight Systems traces/summaries (NVTX + NCCL ranges).

performance hpc amp slurm pytorch reproducibility distributed-training mixed-precision gpt2 deepspeed zero-1 gpu-benchmarking

Updated Feb 4, 2026
Python

Improve this page

Add a description, image, and links to the gpu-benchmarking topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gpu-benchmarking topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu-benchmarking

Here are 7 public repositories matching this topic...

Cre4T3Tiv3 / jetson-orin-matmul-analysis

kevinbazira / llm-rocm-benchmarks

ZrobMiloudaa / jetson-orin-matmul-analysis

tdiprima / run_system_checks

Tennisee-data / benchHUB

FluidNumerics / gpu-microbenchmarks

kadamrahul18 / GPT2-Optimization

Improve this page

Add this topic to your repo