A hands-on workshop series building up to GPT from scratch, following Andrej Karpathy's Neural Networks: Zero to Hero course. Organized by Headstarter and led by Saad Jamal.
This series walks through the core concepts of neural networks step by step: starting from what a derivative is and ending with training language models. Each lecture is a self-contained Jupyter notebook with code, explanations, and visualizations.
lecture1.ipynb
- Derivatives from first principles: limit definition, slopes, and signs
- Value object: building an autograd scalar with operator overloading (
+,*,tanh,exp,pow) - Computation graphs: visualizing the DAG with Graphviz
- Chain rule & backpropagation: manually computing gradients, then automating it with topological sort
- Single neuron: forward pass through
w * x + bwith tanh activation - Multi-layer perceptron:
Neuron,Layer, andMLPclasses from scratch - Training loop: forward pass, MSE loss, backward pass, gradient descent
- PyTorch comparison: verifying gradients match PyTorch's autograd
lecture2.ipynb
- Character-level language modeling: predicting the next character from the previous one
- Bigram statistics: counting character pair frequencies from a 32K name dataset
- Probability distributions: normalizing counts, sampling with
torch.multinomial - Broadcasting semantics: practical exercises with PyTorch tensor operations
- Maximum likelihood estimation: log likelihood, negative log likelihood as a loss function
- Smoothing: handling zero-count bigrams with additive smoothing
- Neural network approach: one-hot encoding, logits, softmax, and gradient descent to learn the same bigram model
- Regularization: penalizing large weights to produce smoother distributions
- Teaser for next lecture: building the dataset for a trigram / MLP model (Bengio et al., 2003)
- Neural Networks: Zero to Hero (YouTube): Andrej Karpathy
- micrograd: Karpathy's autograd engine
- A Neural Probabilistic Language Model (Bengio et al., 2003)
