This repository documents my learning journey in Machine Learning and Data Science, featuring projects, experiments, and detailed notes compiled throughout my studies.
Tools & Libraries:
- Python, Numpy, Pandas
- Matplotlib, Seaborn, Plotly
- Streamlit
- Exploratory Data Analysis (EDA)
- Web Scraping
- Pandas Profiling
- Scikit-learn (Encoding Techniques, Pipeline)
- Scipy
Projects:
- Smartphone Data Analysis - Web scraping and comprehensive analysis of smartphone specifications
- Titanic Dataset EDA - Classic exploratory data analysis
- National Anthem Analysis - Discovering hidden patterns between countries through their anthems | Read More
- Walmart Sales EDA - My first exploratory data analysis project | Read More
- Hypothesis Testing
- Bayes Theorem
- Probability Distributions
- Descriptive Statistics
- Inferential Statistics
- Central Limit Theorem
- Correlation and Regression
- ANOVA
- Chi-Square Test
- Sampling Techniques
- Simple, Multiple, and Polynomial Linear Regression
- Gradient Descent (Batch, Stochastic, Mini-Batch) - implemented from scratch
- Regression Analysis (F-Statistic, R², Adjusted R², p-Value)
- Regression Assumptions (Linearity, Normality, Homoscedasticity, No Autocorrelation, Multicollinearity)
- Regularization Techniques (Lasso, Ridge, Elastic Net)
- Bias-Variance Tradeoff
- Filter Based Technique
- Wrapper Method
- Embedded Technique
- K-Nearest Neighbors (KNN)
- Naive Bayes Classifier (Gaussian, Multinomial, Bernoulli, Complement, Categorical)
- Text Classification implementation
- Spam Classifier | Live Demo
- Logistic Regression (One-vs-Rest, Multinomial)
- Digit Recognition Web App | Live Demo
- Support Vector Machines (Hard Margin, Soft Margin, Kernel Trick)
- Decision Trees (CART, Gini Impurity, Entropy, Pruning)
- K Means
- DBSCAN
- Hierarchical Clustering
- Confusion Matrix
- Accuracy, Precision, Recall, F1 Score
- ROC Curve and AUC
- Principal Component Analysis (PCA)
- Explained Variance
- Covariance Matrix
- Eigenvalues and Eigenvectors
- Cross Validation (Leave One Out, K-Fold, Stratified K-Fold)
- Data Leakage Prevention
- Hyperparameter Tuning (Grid Search, Randomized Search, Bayesian Optimization)
- Voting Ensemble (Hard Voting, Soft Voting)
- Bagging (Bagging Regressor/Classifier, Random Forests)
- Boosting (AdaBoost, Gradient Boosting, XGBoost)
- Maximum Likelihood Estimation and Loss Functions
- Constrained Optimization Problems
Note: Detailed deep learning work is maintained in a separate repository
- Tokenization
- Stopword Removal
- Stemming and Lemmatization
- Named Entity Recognition
- Bag of Words
- TF-IDF
- Spam Detection Project
- Perceptron and Multi-Layer Perceptron (MLP)
Contact: LinkedIn