From tokenization to transformers. From fine-tuning to deployment. Build real AI that works.
60 hands-on lessons · 4 progressive levels · Real-world projects
Quick Start · Learning Flow · Course Map · Contributing · LinkedIn
If you find this helpful, please star the repository!
Made by a student, for students
| Section | Description |
|---|---|
| Learning Flow | Visual journey from Basic → Expert |
| Course Map | Level-by-level breakdown with Mermaid |
| Getting Started | Prerequisites, install, first steps |
| Repository Structure | Folder layout and navigation |
| Learning Outcomes | What you'll achieve |
| References & Support | Resources and contact |
This course is designed as a single path: each level builds on the previous one. Use the diagrams below to see how concepts connect and where to go next.
%%{init: {'theme':'base', 'themeVariables': {'primaryTextColor':'#0f172a', 'primaryColor':'#dbeafe', 'primaryBorderColor':'#2563eb', 'secondaryTextColor':'#0f172a', 'secondaryColor':'#fef3c7', 'tertiaryTextColor':'#0f172a', 'tertiaryColor':'#dcfce7', 'background':'#ffffff', 'mainBkg':'#f8fafc', 'secondBkg':'#f1f5f9', 'textColor':'#1e293b', 'titleColor':'#0f172a', 'clusterBkg':'#eff6ff', 'clusterBorder':'#2563eb', 'nodeBorder':'#2563eb', 'edgeColor':'#475569', 'lineColor':'#475569'}}}%%
flowchart LR
subgraph L1["Level 1: Basic"]
direction TB
A1[Text & Tokens]
A2[Embeddings & Attention]
A3[Transformers & Inference]
A4[Prompts & Evaluation]
A1 --> A2 --> A3 --> A4
end
subgraph L2["Level 2: Intermediate"]
direction TB
B1[Transfer & Fine-tuning]
B2[Model Families]
B3[RAG & Efficiency]
B1 --> B2 --> B3
end
subgraph L3["Level 3: Advanced"]
direction TB
C1[Train from Scratch]
C2[Distributed & Optimized]
C3[Multimodal & Alignment]
C1 --> C2 --> C3
end
subgraph L4["Level 4: Expert"]
direction TB
D1[Alignment & MoE]
D2[Agents & Tools]
D3[Production & Research]
D1 --> D2 --> D3
end
L1 --> L2 --> L3 --> L4
%%{init: {'theme':'base', 'themeVariables': {'primaryTextColor':'#0f172a', 'primaryColor':'#dbeafe', 'primaryBorderColor':'#2563eb', 'secondaryTextColor':'#0f172a', 'secondaryColor':'#e0e7ff', 'tertiaryTextColor':'#0f172a', 'tertiaryColor':'#dcfce7', 'background':'#ffffff', 'mainBkg':'#f8fafc', 'textColor':'#1e293b', 'titleColor':'#0f172a', 'clusterBkg':'#f0f9ff', 'clusterBorder':'#0284c7', 'nodeBorder':'#0ea5e9', 'edgeColor':'#475569'}}}%%
flowchart TB
subgraph Foundations
T[Text & Tokenization]
E[Embeddings]
Att[Attention]
PE[Positional Encoding]
T --> E --> Att --> PE
end
subgraph Architecture
Enc[Encoder-Decoder]
PT[Pre-training]
TG[Text Generation]
PE --> Enc
Att --> Enc
Enc --> PT --> TG
end
subgraph Application
Prompt[Prompt Engineering]
Zero[Zero-shot]
Few[Few-shot]
Eval[Metrics]
TG --> Prompt --> Zero --> Few
TG --> Eval
end
subgraph Adaptation
TL[Transfer Learning]
FT[Fine-tuning]
LoRA[LoRA & Adapters]
Foundations --> TL
Architecture --> TL
TL --> FT --> LoRA
end
subgraph Systems
RAG[RAG & Vectors]
Deploy[Deployment]
Agents[Agents]
LoRA --> RAG
RAG --> Deploy
Deploy --> Agents
end
Application --> Adaptation
X-axis: Beginner → Expert. Y-axis: Theory (bottom) → Production (top).
%%{init: {'theme':'dark'}}%%
quadrantChart
title Skill level by course stage
x-axis Beginner --> Expert
y-axis Theory --> Production
quadrant-1 Research & custom
quadrant-2 Deploy & scale
quadrant-3 Foundations
quadrant-4 Fine-tune & adapt
"Basic L1-L15": [0.2, 0.3]
"Intermediate L16-L30": [0.45, 0.5]
"Advanced L31-L45": [0.7, 0.6]
"Expert L46-L60": [0.9, 0.85]
Goal: Understand text, tokens, embeddings, attention, and how to use pre-trained transformers.
%%{init: {'theme':'base', 'themeVariables': {'primaryTextColor':'#0f172a', 'primaryColor':'#dbeafe', 'secondaryColor':'#e0e7ff', 'tertiaryColor':'#dcfce7', 'background':'#ffffff', 'mainBkg':'#f8fafc', 'textColor':'#1e293b', 'titleColor':'#0f172a', 'nodeBorder':'#2563eb', 'clusterBorder':'#2563eb'}}}%%
mindmap
root((Basic))
Text
Tokenization
Vocabulary
Pipelines
Representations
Word Embeddings
Attention
Positional Encoding
Architecture
Transformer
Pre-training
Tokenizers
Usage
Loading & Inference
Text Generation
Prompts
Evaluation
Zero-shot
Few-shot
Metrics
Project: Chatbot
| # | Topic | One-line description |
|---|---|---|
| L1 | Text Processing & Tokenization | Text data, tokens, vocabulary |
| L2 | Transformer Pipelines | Pre-trained models, classification, sentiment |
| L3 | Word Embeddings | Word2Vec, GloVe, embedding spaces |
| L4 | Attention Mechanisms | Self-attention, multi-head attention |
| L5 | Positional Encoding | Position in sequences |
| L6 | Transformer Architecture | Encoder-decoder, layer norm |
| L7 | Pre-training Concepts | MLM, CLM, training objectives |
| L8 | Tokenizers Deep Dive | BPE, WordPiece, SentencePiece |
| L9 | Model Loading & Inference | HuggingFace, model selection |
| L10 | Text Generation Basics | Greedy, beam search, sampling |
| L11 | Prompt Engineering 101 | Effective prompts |
| L12 | Zero-shot Learning | Classification without training |
| L13 | Few-shot Learning | Learning from examples |
| L14 | Model Evaluation Metrics | Perplexity, BLEU, ROUGE |
| L15 | Basic Project | Simple chatbot |
Goal: Fine-tune models, use major families (BERT/GPT/T5), and build RAG and efficient inference.
%%{init: {'theme':'base', 'themeVariables': {'primaryTextColor':'#0f172a', 'primaryColor':'#fef3c7', 'primaryBorderColor':'#d97706', 'secondaryColor':'#dcfce7', 'tertiaryColor':'#dbeafe', 'background':'#ffffff', 'mainBkg':'#fffbeb', 'textColor':'#1e293b', 'nodeBorder':'#b45309', 'edgeColor':'#e2e8f0', 'lineColor':'#e2e8f0', 'arrowheadColor':'#e2e8f0'}}}%%
flowchart LR
L16[Transfer Learning] --> L17[Fine-tuning]
L17 --> L18[LoRA]
L18 --> L19[BERT] & L20[GPT] & L21[T5]
L19 & L20 & L21 --> L22[Domain] & L23[Multi-task] & L24[Instruction]
L24 --> L25[RLHF]
L25 --> L26[Compression] --> L27[Efficient Inference]
L27 --> L28[RAG] --> L29[Vectors] --> L30[Project: Doc QA]
| # | Topic | One-line description |
|---|---|---|
| L16 | Transfer Learning | Adapting pre-trained models |
| L17 | Fine-tuning Techniques | Full vs parameter-efficient |
| L18 | LoRA & Adapters | Low-rank adaptation |
| L19 | BERT Family | BERT, RoBERTa, ALBERT, DistilBERT |
| L20 | GPT Family | GPT-2/3/4 architecture |
| L21 | T5 & Seq2Seq | Text-to-text frameworks |
| L22 | Domain Adaptation | Domain-specific models |
| L23 | Multi-task Learning | Multiple objectives |
| L24 | Instruction Tuning | Following instructions |
| L25 | RLHF Basics | Human feedback |
| L26 | Model Compression | Distillation, pruning, quantization |
| L27 | Efficient Inference | Deployment optimizations |
| L28 | RAG Systems | Retrieval-augmented generation |
| L29 | Vector Databases | Storing/retrieving embeddings |
| L30 | Intermediate Project | Document Q&A system |
Goal: Train from scratch, use custom architectures, distributed and mixed-precision training, and multimodal/alignment topics.
%%{init: {'theme':'base', 'themeVariables': {'primaryTextColor':'#0f172a', 'primaryColor':'#dcfce7', 'primaryBorderColor':'#15803d', 'secondaryTextColor':'#0f172a', 'secondaryColor':'#dbeafe', 'tertiaryTextColor':'#0f172a', 'tertiaryColor':'#e0e7ff', 'background':'#ffffff', 'mainBkg':'#f0fdf4', 'textColor':'#1e293b', 'titleColor':'#0f172a', 'clusterBkg':'#ecfdf5', 'clusterBorder':'#059669', 'nodeBorder':'#0d9488', 'edgeColor':'#e2e8f0', 'lineColor':'#e2e8f0', 'arrowheadColor':'#e2e8f0'}}}%%
flowchart TB
subgraph Train
L31[Train from Scratch]
L32[Custom Architectures]
L31 --> L32
end
subgraph Scale
L33[Distributed]
L34[Mixed Precision]
L35[Gradient Accum]
L39[Model Parallelism]
L40[Flash Attention]
L33 --> L34 --> L35
L34 --> L39 --> L40
end
subgraph Data
L37[Data Prep]
L38[Tokenizer Training]
end
subgraph Beyond
L41[Long Context]
L42[Multimodal]
L43[Code Models]
L44[Alignment]
L45[Project]
end
Train --> Scale
Data --> Train
Scale --> Beyond
| # | Topic | One-line description |
|---|---|---|
| L31 | Training from Scratch | Building LLMs from scratch |
| L32 | Custom Architectures | Novel transformer variants |
| L33 | Distributed Training | Multi-GPU, multi-node |
| L34 | Mixed Precision Training | FP16, BF16 |
| L35 | Gradient Accumulation | Large models, limited memory |
| L36 | Advanced Optimization | AdamW, Lion, LR schedules |
| L37 | Data Preparation | Curating and cleaning data |
| L38 | Tokenizer Training | Custom tokenizers |
| L39 | Model Parallelism | Pipeline and tensor parallelism |
| L40 | Flash Attention | Efficient attention |
| L41 | Long Context Models | Extended sequences |
| L42 | Multimodal LLMs | Vision-language (CLIP, LLaVA) |
| L43 | Code Generation Models | Codex, CodeLlama, StarCoder |
| L44 | Model Alignment | Safety, bias, ethics |
| L45 | Advanced Project | Specialized domain model |
Goal: Alignment, MoE/SSMs, agents, tools, deployment, and capstone.
%%{init: {'theme':'base', 'themeVariables': {'primaryTextColor':'#0f172a', 'primaryColor':'#ede9fe', 'primaryBorderColor':'#6d28d9', 'secondaryColor':'#fce7f3', 'tertiaryColor':'#dbeafe', 'background':'#ffffff', 'mainBkg':'#faf5ff', 'textColor':'#1e293b', 'nodeBorder':'#7c3aed', 'edgeColor':'#e2e8f0', 'lineColor':'#e2e8f0', 'arrowheadColor':'#e2e8f0'}}}%%
flowchart LR
L46[Constitutional AI] --> L47[MoE]
L47 --> L48[State Space]
L48 --> L49[Retrieval]
L49 --> L50[Agents] --> L51[Tools]
L51 --> L52[Deploy] --> L53[Cost]
L53 --> L54[Eval] & L55[Security] & L56[Continual] & L57[Merging]
L54 & L55 & L56 & L57 --> L58[Research] --> L59[Frameworks] --> L60[Capstone]
| # | Topic | One-line description |
|---|---|---|
| L46 | Constitutional AI | Advanced alignment |
| L47 | Mixture of Experts | MoE (Mixtral, GPT-4) |
| L48 | State Space Models | Mamba, alternatives to attention |
| L49 | Retrieval Systems | Advanced RAG, HyDE |
| L50 | Agent Systems | LLM-powered agents |
| L51 | Tool Use & Function Calling | External tools |
| L52 | Production Deployment | Serving, scaling, monitoring |
| L53 | Cost Optimization | Reducing inference cost |
| L54 | Evaluation Frameworks | LLM-as-judge, benchmarks |
| L55 | Prompt Injection & Security | Attacks and defenses |
| L56 | Continual Learning | Updating with new data |
| L57 | Model Merging | Combining models |
| L58 | Research Paper Implementation | Reproducing SOTA |
| L59 | Custom Training Frameworks | Training pipelines |
| L60 | Capstone Project | End-to-end LLM application |
| Requirement | Notes |
|---|---|
| Python | 3.8+ |
| ML basics | Supervised learning, loss, gradients |
| Framework | PyTorch or TensorFlow |
| Environment | Jupyter Notebook or Google Colab |
# Clone the repository
git clone https://github.com/nexageapps/LLM.git
cd LLM
# Install dependencies
pip install -r requirements.txt
# Launch Jupyter
jupyter notebook- Start at Basic L1 and go in order.
- Run every notebook and finish the exercises.
- Do each level project (L15, L30, L45, L60) before moving on.
- Experiment — change prompts, data, and hyperparameters.
- Use the Mermaid maps above to see how lessons connect.
LLM/
├── README.md # This file
├── requirements.txt # Python dependencies
├── 01_Basic/ # Level 1 (L1–L15)
│ ├── README.md
│ ├── L1_Text_Processing_Tokenization.ipynb
│ ├── L2_Transformer_Pipelines.ipynb
│ └── ...
├── 02_Intermediate/ # Level 2 (L16–L30)
│ ├── README.md
│ ├── L16_Transfer_Learning.ipynb
│ └── ...
├── 03_Advanced/ # Level 3 (L31–L45)
│ ├── README.md
│ ├── L31_Training_From_Scratch.ipynb
│ └── ...
├── 04_Expert/ # Level 4 (L46–L60)
│ ├── README.md
│ └── ...
├── datasets/ # Sample datasets
├── models/ # Saved models
└── utils/ # Helper functions
By the end of the course you will be able to:
| Area | Outcome |
|---|---|
| Theory | Explain transformers from first principles |
| Training | Train and fine-tune custom LLMs |
| Efficiency | Use LoRA, distillation, quantization |
| Systems | Implement RAG, RLHF, and evaluation |
| Deployment | Ship production-ready LLM apps |
| Practice | Apply prompt engineering and evaluation |
| Frontiers | Work with multimodal and agent systems |
- Build a Large Language Model (From Scratch) — Sebastian Raschka
- Attention Is All You Need — Vaswani et al. (2017)
- HuggingFace Transformers
- OpenAI & Anthropic research
- Google AI Blog
- Fork the repo
- Create a feature branch
- Add improvements
- Open a pull request
| Author | Karthik Arjun |
| Connect | |
| Issues | Open an issue on GitHub |
| Discussions | Use repo discussions for questions |
This project is licensed under the MIT License — see LICENSE.
- Sebastian Raschka — Build a Large Language Model
- HuggingFace — transformers and ecosystem
- OpenAI, Anthropic, Google — LLM research
- The open-source AI community
This repository was created as part of my personal learning journey in Artificial Intelligence during my Master of Artificial Intelligence program at the University of Auckland.
Modern AI tools, including large language models, were used to assist with structuring parts of the documentation, improving explanations, and organizing the learning materials more effectively.
These tools served as assistants during the writing and documentation process. All learning paths, notebook implementations, and educational design decisions were created, reviewed, and curated by the author.
I am grateful to the open-source AI community and the developers of AI tools that help accelerate learning and knowledge sharing.
Happy learning!
Last updated: March 2026