Skip to content

ankitpokhrel08/machine_learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science & Machine Learning

Gemini_Generated_Image_asi4gaasi4gaasi4

This repository documents my learning journey in Machine Learning and Data Science, featuring projects, experiments, and detailed notes compiled throughout my studies.


Data Analysis

Tools & Libraries:

  • Python, Numpy, Pandas
  • Matplotlib, Seaborn, Plotly
  • Streamlit
  • Exploratory Data Analysis (EDA)
  • Web Scraping
  • Pandas Profiling
  • Scikit-learn (Encoding Techniques, Pipeline)
  • Scipy

Projects:

  • Smartphone Data Analysis - Web scraping and comprehensive analysis of smartphone specifications
  • Titanic Dataset EDA - Classic exploratory data analysis
  • National Anthem Analysis - Discovering hidden patterns between countries through their anthems | Read More
  • Walmart Sales EDA - My first exploratory data analysis project | Read More

Probability & Statistics

  • Hypothesis Testing
  • Bayes Theorem
  • Probability Distributions
  • Descriptive Statistics
  • Inferential Statistics
  • Central Limit Theorem
  • Correlation and Regression
  • ANOVA
  • Chi-Square Test
  • Sampling Techniques

Machine Learning

Regression

  • Simple, Multiple, and Polynomial Linear Regression
  • Gradient Descent (Batch, Stochastic, Mini-Batch) - implemented from scratch
  • Regression Analysis (F-Statistic, R², Adjusted R², p-Value)
  • Regression Assumptions (Linearity, Normality, Homoscedasticity, No Autocorrelation, Multicollinearity)
  • Regularization Techniques (Lasso, Ridge, Elastic Net)
  • Bias-Variance Tradeoff

Feature Engineering

  • Filter Based Technique
  • Wrapper Method
  • Embedded Technique

Classification

  • K-Nearest Neighbors (KNN)
  • Naive Bayes Classifier (Gaussian, Multinomial, Bernoulli, Complement, Categorical)
    • Text Classification implementation
    • Spam Classifier | Live Demo
  • Logistic Regression (One-vs-Rest, Multinomial)
  • Support Vector Machines (Hard Margin, Soft Margin, Kernel Trick)
  • Decision Trees (CART, Gini Impurity, Entropy, Pruning)

Clustering

  • K Means
  • DBSCAN
  • Hierarchical Clustering

Evaluation Metrics

  • Confusion Matrix
  • Accuracy, Precision, Recall, F1 Score
  • ROC Curve and AUC

Dimensionality Reduction

  • Principal Component Analysis (PCA)
  • Explained Variance
  • Covariance Matrix
  • Eigenvalues and Eigenvectors

Model Selection & Tuning

  • Cross Validation (Leave One Out, K-Fold, Stratified K-Fold)
  • Data Leakage Prevention
  • Hyperparameter Tuning (Grid Search, Randomized Search, Bayesian Optimization)

Ensemble Methods

  • Voting Ensemble (Hard Voting, Soft Voting)
  • Bagging (Bagging Regressor/Classifier, Random Forests)
  • Boosting (AdaBoost, Gradient Boosting, XGBoost)

Advanced Topics

  • Maximum Likelihood Estimation and Loss Functions
  • Constrained Optimization Problems

Deep Learning

Note: Detailed deep learning work is maintained in a separate repository

Natural Language Processing

  • Tokenization
  • Stopword Removal
  • Stemming and Lemmatization
  • Named Entity Recognition
  • Bag of Words
  • TF-IDF
  • Spam Detection Project

Neural Networks

  • Perceptron and Multi-Layer Perceptron (MLP)

Live Projects


Contact: LinkedIn

About

This repository contains all my progress during my learning phase for Machine Learning and Data Science. It includes various projects, experiments, and notes that I have compiled over time.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors