Neural Networks with NumPy

This repository contains a NumPy implementation of a feedforward neural network for training an MNIST handwritten digit classifier.

Directory Structure

NumPyNet/
├── data/
│   └── (save directory for MNIST)
├── dataset.py (dataset and dataloader classes)
├── model.py (neural network layers)
├── module.py (module/optimizer base classes)
└── train.ipynb (training script notebook)

Model

The model consists of a simple three-layer network with a ReLU activation function. The network is trained using the cross-entropy loss function and the RMSProp optimizer. Below is a diagram showing the model architecture and how the hidden layer values are computed.

We start with a flattened image vector $\mathbf{x} \in \mathbb{R}^{d}$, which is passed through the first layer (with a ReLU activation) to produce a hidden layer $\mathbf{z}^{(1)} \in \mathbb{R}^{h}$. The hidden layer is then passed through the second layer to produce a logit vector $\mathbf{z}^{(2)} \in \mathbb{R}^{K}$, which corresponds to a value for each of the $K$ classes. The logit vector is subsequently passed through a softmax function to produce a probability distribution over the classes. Assuming now that we have a batch of images, $\mathbf{X} \in \mathbb{R}^{N \times d}$, the forward pass can be written as:

$$ \begin{align*} \mathbf{Z}^{(1)} &= \sigma(\mathbf{X} \mathbf{W}^{(0)} + \mathbf{b}^{(0)}) \in \mathbb{R}^{N \times h} \\ \\ \mathbf{Z}^{(2)} &= \mathbf{Z}^{(1)} \mathbf{W}^{(1)} + \mathbf{b}^{(1)} \in \mathbb{R}^{N \times K} \\ \\ \mathbf{P} &= \text{softmax}(\mathbf{Z}^{(2)}) \in \mathbb{R}^{N \times K} \\ \\ \mathcal{L} &= -\frac{1}{N} \sum_{n=1}^{N} \sum_{k=1}^{K} Y_{nk} \log(P_{nk}) \end{align*} $$

where $\sigma(\cdot) = \text{ReLU}(\cdot)$ is the activation function, and ${\mathbf{W}^{(0)}, \mathbf{W}^{(1)}, \mathbf{b}^{(0)}, \mathbf{b}^{(1)}}$ are the model parameters. The loss function $\mathcal{L}$ is the cross-entropy loss, and $Y_{nk}$ is the one-hot encoded label for the $n$-th sample in the batch. For the backward pass, we compute

$$ \begin{align*} \frac{\partial \mathcal{L}}{\partial \mathbf{Z}^{(2)}} &= \mathbf{P} - \mathbf{Y} \in \mathbb{R}^{N \times K} \\ \\ \frac{\partial \mathcal{L}}{\partial \mathbf{W}^{(1)}} &= \mathbf{Z}^{(1)T} \left( \frac{\partial \mathcal{L}}{\partial \mathbf{Z}^{(2)}} \right) \in \mathbb{R}^{h \times K} \\ \\ \frac{\partial \mathcal{L}}{\partial \mathbf{b}^{(1)}} &= \mathbf{1}^{T} \left( \frac{\partial \mathcal{L}}{\partial \mathbf{Z}^{(2)}} \right) \in \mathbb{R}^{K} \\ \\ \frac{\partial \mathcal{L}}{\partial \mathbf{Z}^{(1)}} &= \left( \frac{\partial \mathcal{L}}{\partial \mathbf{Z}^{(2)}} \right) \mathbf{W}^{(1)T} \in \mathbb{R}^{N \times h} \\ \\ \frac{\partial \mathcal{L}}{\partial \mathbf{W}^{(0)}} &= \mathbf{X}^{T} \left( \frac{\partial \mathcal{L}}{\partial \mathbf{Z}^{(1)}} \right) \in \mathbb{R}^{d \times h} \\ \\ \frac{\partial \mathcal{L}}{\partial \mathbf{b}^{(0)}} &= \mathbf{1}^{T} \left( \frac{\partial \mathcal{L}}{\partial \mathbf{Z}^{(1)}} \right) \in \mathbb{R}^{h} \end{align*} $$

where $\mathbf{1} \in \mathbb{R}^{N}$ is a vector of ones. These gradients are propagated backwards (hence backward pass) via the chain rule to compute the gradients with respect to the model parameters. The gradients are then used to update the model parameters using the RMSProp optimizer.

Model Performance

Below are training loss and accuracy plots, along with a sample of predictions (including logit histograms) on the MNIST test set. Code for plotting both figures is located in the notebook train.ipynb.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Networks with NumPy

Directory Structure

Model

Model Performance

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
.gitignore		.gitignore
README.md		README.md
dataset.py		dataset.py
model.py		model.py
module.py		module.py
train.ipynb		train.ipynb

Folders and files

Latest commit

History

Repository files navigation

Neural Networks with NumPy

Directory Structure

Model

Model Performance

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages