Machine Learning Algorithms Implementation

A comprehensive collection of fundamental Machine Learning algorithms implemented from scratch and using Scikit-Learn. This repository serves as a practical guide for understanding the mathematical foundations, intuition, and implementation details of various supervised learning models.

Algorithms at a Glance

Regression ( Predict a Number )	Classification ( Predict a Category )
• Linear Regression	• Logistic Regression
• Multiple Linear Regression	• Support Vector Machine (SVM)
• Polynomial Regression	• Naive Bayes
• Support Vector Regression (SVR)	• K-Nearest Neighbors (KNN)
• KNN Regression	• Decision Tree Classification
• Decision Tree Regression

Type	Algorithm	Key Idea
Regression	Linear Regression	Best-fit straight line
Regression	Multiple Linear Regression	Multiple features → one target
Regression	Polynomial Regression	Curved best-fit line
Regression	SVR	Fit within error margin
Regression	KNN Regression	Average of nearest neighbors
Regression	Decision Tree Reg.	Data splitting based on MSE
Classification	Logistic Regression	Probability of class membership
Classification	SVM	Maximum margin hyperplane
Classification	Naive Bayes	Bayes' Theorem + independence
Classification	KNN Classification	Majority vote of neighbors
Classification	Decision Tree Class.	Data splitting based on Gini/Entropy

Regression Models

Goal: Predict a continuous numerical value (e.g., price, salary, temperature).

1️⃣ Linear Regression

Finds the best-fit straight line through the data to predict a continuous target from a single feature.

Core Equation

ŷ = β₀ + β₁x
Where:
- ŷ = Predicted value
- β₀ = Intercept (y-axis crossing)
- β₁ = Slope (rate of change)
- x = Input feature

y
│
│        •
│      •
│    •
│  •
│ •───────────────        ← Regression line (ŷ = β₀ + β₁x)
│   │
│   │ residual (error)
│   •
│     •
└────────────────────────── x

✅ When to Use

Linear relationship between feature and target
Single input feature
Quick baseline model

⚠️ Limitations

Cannot capture curved or complex patterns
Sensitive to outliers
Assumes constant variance of errors (homoscedasticity)

2️⃣ Multiple Linear Regression

Extends linear regression to multiple independent variables, predicting outcomes based on several features simultaneously.

Core Equation

ŷ = β₀ + β₁x₁ + β₂x₂ + β₃x₃ + ... + βₙxₙ

Example:

Salary = β₀ + β₁(Experience) + β₂(Education) + β₃(Age)

✅ When to Use

Multiple features influence the target
Features are not highly correlated with each other

⚠️ Watch Out For

Issue	What It Means	Solution
Multicollinearity	Features are correlated with each other	Remove or combine correlated features
Irrelevant features	Noise features hurt performance	Feature selection (backward elimination)
Different scales	Large-valued features dominate	Apply feature scaling

3️⃣ Polynomial Linear Regression

Models non-linear (curved) relationships by transforming features into polynomial terms while still using a linear model framework.

Core Equation

ŷ = β₀ + β₁x + β₂x² + β₃x³ + ... + βₙxⁿ

Degree 1: Straight line ──────
Degree 2: Parabola ╱╲
Degree 3: S-curve ╱╲╱

✅ When to Use

Data shows a curved / non-linear trend
Linear regression gives poor results (low R²)

⚠️ Watch Out For

Overfitting with high-degree polynomials — the model memorizes noise
Always validate with a test set or cross-validation

4️⃣ Support Vector Regression (SVR)

Uses Support Vector Machines to predict continuous values by fitting data within an ε-insensitive tube (error margin).

Core Concept

Points OUTSIDE the tube = Support Vectors (they define the model)
Points INSIDE the tube = No penalty

y
│      / -----------------  Upper Boundary
│     /   •    •   /
│    /───────────/ <--------- Regression Line
│   /   •     • /
│  / -----------------      Lower Boundary (Tube width = 2ε)
└────────────────────────── x

✅ When to Use

Data has outliers (SVR is robust to them)
Non-linear relationships (with RBF / polynomial kernels)
Medium-sized datasets

Key Hyperparameters

Parameter	Role
`C`	Regularization — trade-off between error and margin width
`ε` (epsilon)	Width of the insensitive tube
`kernel`	Transformation function (`linear`, `rbf`, `poly`)

⚠️ Important

SVR requires feature scaling — always standardize your features before training.

5️⃣ K-Nearest Neighbors (KNN) Regression

Predicts a value based on the average (or weighted average) of the k most similar neighboring data points.

Core Concept

Prediction = Average of neighbor values = (y₁ + y₂ + y₃) / 3

y
│         • (Actual)
│        / 
│  •───○─────•  <-- Local average (Prediction)
│     /  k=3
│    • 
└────────────────────────── x

✅ When to Use

Non-linear relationships
Small to medium datasets
When you need a simple, intuitive model

Choosing the Right `k`

Tip: Iterate through k = 1 to 20, plot accuracy, and select the optimal value.

⚠️ Watch Out For

Requires feature scaling — distance-based algorithm
Slow on large datasets (computes distances to every point)
Sensitive to irrelevant features

6️⃣ Decision Tree Regression

Predicts a continuous value by splitting the dataset into smaller subsets (leaves) based on feature thresholds, forming a tree-like structure of decisions.

Core Concept

Recursive Partitioning: The algorithm splits data where it reduces the Mean Squared Error (MSE) the most.
Leaf Nodes: The final prediction is the average value of all training points that fall into that specific leaf.

y
│               _______ (Avg of Region 3)
│              |
│       _______|        (Avg of Region 2)
│      |
│______|                (Avg of Region 1)
└──────┬───────┬─────────── x
    Split 1  Split 2

✅ When to Use

Non-linear and complex datasets
When you need a model that handles both numerical and categorical data without much preprocessing
No feature scaling required

⚠️ Watch Out For

High Risk of Overfitting: A tree can grow deep enough to memorize every data point.
Instability: Small changes in data can lead to a completely different tree structure.
Solution: Limit max_depth or use "Pruning."

Classification Models

Goal: Predict a discrete category/class (e.g., spam/not spam, disease/healthy, yes/no).

1️⃣ Logistic Regression

A foundational binary classification algorithm that estimates the probability of class membership using the sigmoid function.

Core Equation

             1
P(y = 1) = ─────────────
           1 + e^(-(β₀ + β₁x))

Output: Probability between 0 and 1 Decision Rule: If P ≥ 0.5 → Class 1, else → Class 0

Sigmoid Function

P(y=1)
1.0 │        ─────────
    │      /
0.5 │ · · · · · · / · · · · · ·   ← Decision boundary
    │    /
0.0 │──────────────
    └──────────────────────── x

✅ When to Use

Binary classification (two classes)
Linearly separable data
When you need probability estimates
Fast, interpretable baseline model

⚠️ Limitations

Assumes a linear decision boundary
Struggles with complex, non-linear patterns

2️⃣ Support Vector Machine (SVM)

Finds the optimal hyperplane that maximizes the margin between different classes. Supports linear and non-linear (kernel-based) classification.

Core Concept

Class A: ○      Class B: ●

○  ○              ●  ●
  ○  ○    ┃     ●  ●
    ○  ○  ┃   ●  ●        ← Maximum margin hyperplane
      ○  ◁┃▷ ●
    ○  ○  ┃   ●  ●
  ○  ○    ┃     ●  ●
          ┃
     ◁────┃────▷
        MARGIN (maximized)

◁▷ = Support Vectors

Kernel Trick

For non-linearly separable data, kernels transform features into a higher-dimensional space where a linear boundary can be found.

Kernel	Use Case	Boundary
`linear`	Linearly separable data	Straight line / plane
`rbf` (Gaussian)	Most non-linear problems	Flexible curved boundary
`poly`	Polynomial boundaries	Curved with degree control

✅ When to Use

High-dimensional data
Clear margin of separation
Binary or multi-class classification

⚠️ Important

SVM requires feature scaling — always standardize before training.

3️⃣ Naive Bayes

A probabilistic classifier based on Bayes’ Theorem with the assumption that features are conditionally independent given the class.

Bayes' Theorem

              P(X | C) · P(C)
P(C | X) = ─────────────────────
                P(X)

Where:

P(C | X) → Posterior probability
P(X | C) → Likelihood
P(C) → Prior probability
P(X) → Evidence

Intuition Example

Is this email SPAM?

Features:

Contains "free"
Contains "winner"
Length > 100

We compute:

P(Spam | features) vs P(Not Spam | features)

Choose the class with higher probability ✅

Variants

Variant	Feature Type	Example
Gaussian NB	Continuous	Age, salary
Multinomial NB	Count data	Word frequency
Bernoulli NB	Binary (0/1)	Word presence

✅ When to Use

Text classification (spam, sentiment)
Small datasets
Real-time prediction (very fast)

⚠️ Limitations

Independence assumption rarely true
Complex models may outperform on large datasets

4️⃣ K-Nearest Neighbors (KNN)

Classifies data points based on the majority vote of their k nearest neighbors.

Core Concept

○  ○        ●
  ○  ◎        ●  ●

    ◎  ★  ◎        ○ = Class A
  ○       ●  ●     ● = Class B
○           ●      ◎ = Neighbors

k = 3 → 2 Class A, 1 Class B
Prediction = Class A ✅

Choosing the Right k

k Value	Effect
Too small (k=1)	Overfitting
Too large (k=50)	Underfitting
Optimal	Best validation accuracy

Hyperparameter Tuning Example

from sklearn.neighbors import KNeighborsClassifier

for k in range(1, 21):
    model = KNeighborsClassifier(n_neighbors=k)
    model.fit(X_train, y_train)
    accuracy = model.score(X_test, y_test)
    print(f"k={k}: Accuracy = {accuracy:.4f}")

✅ When to Use

Non-linear decision boundaries
Small to medium datasets
Multi-class classification

⚠️ Watch Out

Requires feature scaling
Slow on large datasets
Curse of dimensionality

5️⃣ Decision Tree Classification

Breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed.

Core Concept

Splitting Criteria: Uses Gini Impurity or Entropy (Information Gain) to determine the best feature to split on at each node.
Leaf Nodes: The final prediction is the majority class of the samples in that leaf.

Decision Logic Example

Is Age > 30?
├── Yes: Is Income > $50k?
│   ├── Yes: Class A (Buyer)
│   └── No:  Class B (Non-Buyer)
└── No: Class B (Non-Buyer)

✅ When to Use

Clear, rule-based decision making
Non-linear relationships between features
When interpretability is critical (you can see exactly why a choice was made)

⚠️ Limitations

Can create very complex trees that do not generalize well (overfitting)
Biased toward features with many levels/categories

Standard Workflow Used in All Algorithms

┌─────────────────────────────────────────────────────────────┐
│                    STANDARD WORKFLOW                        │
├──────────┬──────────────────────────────────────────────────┤
│ Step 1   │  Data Preprocessing                              │
│          │ • Handling missing values                        │
│          │ • Encoding categorical variables                 │
│          │ • Feature scaling                                │
├──────────┼──────────────────────────────────────────────────┤
│ Step 2   │ Train-Test Split                                 │
│          │ • 80/20 split                                    │
├──────────┼──────────────────────────────────────────────────┤
│ Step 3   │  Model Training                                  │
│          │ • Fit model on training data                     │
├──────────┼──────────────────────────────────────────────────┤
│ Step 4   │  Hyperparameter Tuning                           │
│          │ • K selection (KNN)                              │
│          │ • Kernel selection (SVM)                         │
│          │ • Degree selection                               │
├──────────┼──────────────────────────────────────────────────┤
│ Step 5   │  Performance Evaluation                          │
│          │ • Regression → R², MSE, MAE                      │
│          │ • Classification → Accuracy, Confusion Matrix    │
│          │ • Visualization                                  │
└──────────┴──────────────────────────────────────────────────┘

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
Decision_Tree		Decision_Tree
KNN		KNN
Linear_regression		Linear_regression
Logistic Regression		Logistic Regression
Navie Bayes		Navie Bayes
Support Vector Machine		Support Vector Machine
data		data
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Algorithms Implementation

Algorithms at a Glance

Regression Models

1️⃣ Linear Regression

Core Equation

✅ When to Use

⚠️ Limitations

2️⃣ Multiple Linear Regression

Core Equation

Example:

✅ When to Use

⚠️ Watch Out For

3️⃣ Polynomial Linear Regression

Core Equation

✅ When to Use

⚠️ Watch Out For

4️⃣ Support Vector Regression (SVR)

Core Concept

✅ When to Use

Key Hyperparameters

⚠️ Important

5️⃣ K-Nearest Neighbors (KNN) Regression

Core Concept

✅ When to Use

Choosing the Right k

⚠️ Watch Out For

6️⃣ Decision Tree Regression

Core Concept

✅ When to Use

⚠️ Watch Out For

Classification Models

1️⃣ Logistic Regression

Core Equation

Sigmoid Function

✅ When to Use

⚠️ Limitations

2️⃣ Support Vector Machine (SVM)

Core Concept

Kernel Trick

✅ When to Use

⚠️ Important

3️⃣ Naive Bayes

Bayes' Theorem

Intuition Example

Variants

✅ When to Use

⚠️ Limitations

4️⃣ K-Nearest Neighbors (KNN)

Core Concept

Choosing the Right k

Hyperparameter Tuning Example

✅ When to Use

⚠️ Watch Out

5️⃣ Decision Tree Classification

Core Concept

Decision Logic Example

✅ When to Use

⚠️ Limitations

Standard Workflow Used in All Algorithms

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Choosing the Right `k`

Packages