A comprehensive collection of fundamental Machine Learning algorithms implemented from scratch. This repository covers a wide spectrum of techniques, from classical linear regression to generative probabilistic models and unsupervised clustering.
The projects are organized by algorithm type and complexity. Each directory contains a detailed implementation, mathematical background, and performance visualizations.
- Linear Regression (Closed-Form): Solving linear models analytically using the Normal Equation.
- Linear Regression (SGD): Efficient iterative optimization using Stochastic Gradient Descent for large-scale data.
- Binary Logistic Regression: Probabilistic binary classification using the Sigmoid activation and Log-Loss optimization.
- Multiclass Softmax Regression: Extending logistic regression to multiple classes using Softmax, OvA, and OvO strategies.
- Bayesian GLDA: Gaussian Linear Discriminant Analysis for generative classification with shared covariance.
- Quadratic Discriminant Analysis (QDA): Non-linear multiclass classification with class-specific covariance matrices.
- Naïve Bayes: Sentiment analysis on text data (Yelp, IMDB, Amazon) using probabilistic word frequencies and Laplace smoothing.
- K-Means Clustering: Application of K-Means for image compression and color quantization (Vector Quantization).
This repository serves as a practical guide to the mathematical foundations of Pattern Recognition:
- Optimization: Gradient Descent vs. Analytical Closed-Form solutions.
- Generative vs. Discriminative: Modeling class distributions ($P(x|y)$) vs. direct boundary learning ($P(y|x)$).
- Linear vs. Non-Linear: Understanding when to use linear separators (LDA/Logistic) versus quadratic surfaces (QDA).
- Natural Language Processing: Tokenization and Bag-of-Words modeling for sentiment prediction.
Ensure you have Python 3.x installed. The following libraries are used across various projects:
numpy&scipy: Matrix operations and numerical computing.pandas: Data manipulation and analysis.matplotlib&seaborn: Data visualization and 3D plotting.scikit-learn: Used primarily for data splitting and evaluation metrics.nltk: Natural language processing tools for Naïve Bayes.
git clone https://github.com/aminizahra/Pattern-recognition.git
cd Pattern-recognition
pip install -r requirements.txt # Or install the libraries listed aboveNote: This repository was created for educational purposes to demonstrate the "from-scratch" implementation of machine learning algorithms.