Tiny-NN — Fully Connected Neural Networks in C++20 + CUDA 12.8

Tiny-NN is a high-performance implementation of fully connected neural networks supporting both CPU and GPU execution. It's designed for easy experimentation and benchmarking, featuring:

CPU execution (parallelized)
CUDA execution with memory reuse (weights and biases uploaded only once per layer)
Training with backpropagation and SGD
Model serialization using json.hpp (MIT licensed) included in the repository
Simple MNIST dataset integration and ASCII preview

Requirements

C++20 compatible compiler
CUDA 12.8 (for GPU support)
CMake >= 3.24
Python 3.12 (optional, for dataset download and preview)

Although development was done on Windows 10/11 using Visual Studio 2022, the project can be built on any OS with a compatible C++20 compiler and CUDA installation.

Setup

Clone or copy the repository to your machine.

git clone https://github.com/Vicen-te/tiny-nn.git
cd tiny-nn

Download the MNIST dataset: Using Python script (recommended):

python scripts/download_mnist.py

This will download and save the MNIST dataset in data/minst/.
Alternatively, you can download the dataset manually from Kaggle

Optional: generate a small model using Python (arguments: input layer, hidden layer, output layer):

python data/generate_model.py 128 64 10

Optional: preview MNIST digits:

Python: python scripts/preview.py
C++: ascii_preview() function in MNISTLoader

Build

Visual Studio:

Open Visual Studio -> File -> Open -> Folder... and select the project folder.
Visual Studio will detect CMake. For GPU usage, choose x64 configuration.
Build -> Build All.

PowerShell / Developer Command Prompt (recommended):

Option 1: Specify all options manually

mkdir build
cd build
cmake .. -G "Visual Studio 17 2022" -A x64 -DCUDA_TOOLKIT_ROOT_DIR="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.8"
cmake --build . --config Release

-G "Visual Studio 17 2022" selects Visual Studio 2022
-A x64 selects 64-bit architecture (recommended for CUDA)
-DCUDA_TOOLKIT_ROOT_DIR is optional, CMake can auto-detect CUDA

Note: The -A x64 option is recommended if you want to use CUDA on Windows. On Linux or macOS, this is not necessary.

Option 2: Let CMake detect everything automatically (recommended)

cmake -B build -S .
cmake --build build --config Release

CMake will detect Visual Studio and CUDA if installed in standard locations
-S is the source folder, -B is the build folder

Both methods produce the same result. Use Option 2 for simplicity and fewer manual settings.

Run

From the build/Release folder:

tinny-nn.exe <mode>

Modes:

train or t → Train model
inference or i → Run inference on a sample
benchmark or b → Compare CPU vs CUDA performance

Expected output

Training (train / t)

Training progress printed to console
Training duration in seconds
Saved model JSON to ./data/models/fc_digit_classification.json
ASCII MNIST preview of a single sample image

Inference (inference / i)

Output values of selected sample
Maximum value and its index
ASCII preview of the sample

Benchmark (benchmark / b)

CPU vs GPU inference correctness check
Average inference timings per method
CSV results saved to ./data/results/bench.csv

Currently, benchmark only measures inference, not training. Measuring training performance would require additional implementation.

Notes & Improvements

Currently, weights W and biases b are uploaded to the GPU once per layer. The input vector is uploaded for each inference.
cuBLAS GEMM is already used for matrix multiplications, replacing the simple custom FC kernel.
Intermediate GPU buffers (dX/dY) are allocated per layer and batch and are not fully reused, though CUDA streams enable asynchronous execution.
For higher performance (future improvements):
- Reusing intermediate GPU buffers across layers and batches via CUDA streams.
- Implementing more efficient batching and overlapping of data transfers with computation.
Profiling can be done with Nsight Systems / Nsight Compute.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
code		code
data		data
external		external
scripts		scripts
test		test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.json		config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tiny-NN — Fully Connected Neural Networks in C++20 + CUDA 12.8

Requirements

Setup

Build

Visual Studio:

PowerShell / Developer Command Prompt (recommended):

Option 1: Specify all options manually

Option 2: Let CMake detect everything automatically (recommended)

Run

Expected output

Training (train / t)

Inference (inference / i)

Benchmark (benchmark / b)

Notes & Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tiny-NN — Fully Connected Neural Networks in C++20 + CUDA 12.8

Requirements

Setup

Build

Visual Studio:

PowerShell / Developer Command Prompt (recommended):

Option 1: Specify all options manually

Option 2: Let CMake detect everything automatically (recommended)

Run

Expected output

Training (train / t)

Inference (inference / i)

Benchmark (benchmark / b)

Notes & Improvements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages