Skip to content

ggauravky/Data-Science-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

86 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š Data Science Learning Journey

Master Data Science from Fundamentals to Machine Learning

Python NumPy Pandas Matplotlib Seaborn

Jupyter MySQL BeautifulSoup Scikit-Learn

GitHub stars GitHub forks Visitor Count

πŸš€ Get Started β€’ πŸ“š Curriculum β€’ 🎯 Projects β€’ πŸ’‘ Skills β€’ 🀝 Connect


🌟 About This Course

This repository is a comprehensive, hands-on data science curriculum designed to take you from absolute beginner to proficient data scientist. With 100+ Jupyter notebooks, real-world projects, and structured learning paths, you'll build a strong foundation in:

  • βœ… Python Programming - Master the language of data science
  • βœ… Data Analysis & Manipulation - Work with NumPy and Pandas
  • βœ… Data Visualization - Create stunning charts and insights
  • βœ… Web Scraping - Collect data from any website
  • βœ… SQL Databases - Query and manage data efficiently
  • βœ… Statistics & Probability - Build ML foundations
  • βœ… Machine Learning - Train your first ML models

πŸ“ˆ Status: 🟒 Active & Growing - New content added regularly!


πŸ“š Curriculum

πŸ“– Complete Course Modules

No. Module Topics Covered Content Skills
01 πŸŽ“ Data Science Intro Tools, Environment Setup, Career Paths, DS Lifecycle 1 PDF Guide Foundation setup
02 🐍 Python Fundamentals Variables, Data Types, Operators, Control Flow, Loops, Data Structures, OOP, Lambda 18 Notebooks Complete Python
03 πŸš€ Project: Social Network Recommendation Algorithms, Graph Theory, JSON Processing 3 Notebooks Real-world application
04 πŸ”’ NumPy Mastery Arrays, Indexing, Slicing, Broadcasting, Vectorization 5 Notebooks Numerical computing
05 🐼 Pandas Deep Dive DataFrames, Series, Grouping, Merging, Time Series 2 Notebooks Data manipulation
06 πŸ“Š Data Visualization Line, Bar, Pie, Scatter, Histogram, Heatmaps, Seaborn 8 Notebooks Visual storytelling
07 πŸ•·οΈ Web Scraping HTTP Requests, HTML Parsing, BeautifulSoup, Data Extraction 2 Notebooks + 49 HTML samples Web data collection
08 πŸ—„οΈ SQL & Databases CRUD Operations, Joins, Subqueries, Views, Stored Procedures 20 Tutorials Database management
09 πŸ“ˆ Probability & Stats Conditional Probability, Bayes Theorem, Distributions 3 Tutorials + Practice Statistical thinking
10 πŸ€– ML Introduction How Machines Learn, ML History, Traditional vs ML PPT + Notes ML fundamentals
11 πŸ”§ Sklearn Basics First ML Models, Training, Prediction, Model Selection 3 Notebooks Scikit-learn
12 πŸ“‹ ML Algorithm Types Supervised vs Unsupervised Learning, Use Cases 3 Guides Algorithm selection
13 🎯 ML Practice Iris Classification, Model Evaluation, RMSE, MAE, Test Sets 5+ Notebooks End-to-end ML

πŸš€ Quick Start

Prerequisites

  • πŸ’» Basic computer skills
  • 🧠 Curiosity and willingness to learn
  • ⏰ 8-10 hours per week commitment
  • ❌ No prior programming experience needed!

Installation

Step 1: Clone the repository

git clone https://github.com/ggauravky/Data-Science-Learning.git
cd Data-Science-Learning

Step 2: Set up Python environment

# Option A: Using Conda (Recommended)
conda create -n datasci python=3.11 -y
conda activate datasci
conda install numpy pandas matplotlib seaborn jupyter scikit-learn -y
pip install beautifulsoup4 requests

# Option B: Using pip
pip install numpy pandas matplotlib seaborn jupyter beautifulsoup4 requests scikit-learn

Step 3: Launch Jupyter

jupyter notebook

Step 4: Start learning! πŸŽ‰

Navigate to 002 Python refresher/01_python_basic.ipynb and begin your journey!


πŸ“– Learning Path

🎯 Recommended 12-Week Roadmap

graph LR
    A[Week 1-2: Python] --> B[Week 3-4: NumPy & Pandas]
    B --> C[Week 5-6: Visualization]
    C --> D[Week 7: Web Scraping]
    D --> E[Week 8: SQL]
    E --> F[Week 9-10: Probability]
    F --> G[Week 11-12: Machine Learning]
Loading
πŸ“… Week-by-Week Breakdown (Click to expand)

🌱 Phase 1: Foundation (Weeks 1-4)

Week 1-2: Python Programming

  • Complete all 18 Python notebooks
  • Focus: Variables, loops, functions, OOP
  • Practice: Daily coding exercises
  • Milestone: Build a simple calculator app

Week 3: NumPy

  • Master array operations
  • Learn vectorization techniques
  • Practice: Matrix manipulations

Week 4: Pandas & First Project

  • DataFrame operations
  • Data cleaning techniques
  • Project: Coders of Delhi recommendation system

🌿 Phase 2: Intermediate (Weeks 5-8)

Week 5-6: Data Visualization

  • All chart types in Matplotlib
  • Statistical plots with Seaborn
  • Practice: Visualize real datasets

Week 7: Web Scraping

  • HTTP requests and responses
  • HTML parsing with BeautifulSoup
  • Project: Book scraper

Week 8: SQL Databases

  • CRUD operations
  • Complex joins and queries
  • Practice: Build a movie database

🌳 Phase 3: Advanced (Weeks 9-12)

Week 9-10: Statistics & SQL Advanced

  • Probability distributions
  • Bayes theorem applications
  • Stored procedures and optimization

Week 11-12: Machine Learning

  • ML fundamentals
  • First models with Scikit-learn
  • Project: Iris classification
  • Model evaluation and metrics

🎯 Projects

Featured Real-World Projects

🌐 Coders of Delhi

Social Network Recommendation System

Build algorithms similar to Facebook's "People You May Know" feature.

Tech Stack: Python, JSON, Graph Algorithms
Complexity: Intermediate
Skills: Data structures, algorithms, recommendation engines

Files:

  • data_read.ipynb
  • people_you_may_know.ipynb
  • pages_you_might_like.ipynb

πŸ“š Book Data Scraper

Web Scraping Pipeline

Scrape 49 pages of book data from an online bookstore.

Tech Stack: Requests, BeautifulSoup, Pandas
Complexity: Beginner-Intermediate
Skills: HTTP, HTML parsing, data extraction

Output: Structured CSV with titles, prices, ratings

🌸 Iris Classification

Machine Learning Project

Train and evaluate ML models on the classic Iris dataset.

Tech Stack: Scikit-learn, NumPy, Pandas
Complexity: Intermediate
Skills: Model training, evaluation, accuracy metrics

Notebooks:

  • Quick training
  • Accuracy measurement
  • Data analysis
  • Test set creation
  • Stratified sampling

πŸ“Š Data Analysis Suite

Pandas Practice Projects

Analyze real-world datasets with advanced techniques.

Tech Stack: Pandas, Matplotlib, Seaborn
Complexity: Beginner-Intermediate
Skills: Grouping, merging, aggregation, visualization

Features:

  • Data cleaning pipelines
  • Statistical analysis
  • Trend visualization

πŸ’‘ Skills You'll Gain

🐍 Programming

  • βœ… Python syntax & semantics
  • βœ… Object-oriented programming
  • βœ… Functional programming
  • βœ… List comprehensions
  • βœ… Lambda expressions
  • βœ… File I/O operations
  • βœ… JSON data handling
  • βœ… Error handling

πŸ“Š Data Science

  • βœ… NumPy array operations
  • βœ… Pandas DataFrames
  • βœ… Data cleaning & preprocessing
  • βœ… Statistical analysis
  • βœ… Data visualization
  • βœ… Exploratory data analysis
  • βœ… Feature engineering
  • βœ… Data transformation

πŸ€– Machine Learning

  • βœ… ML fundamentals
  • βœ… Supervised learning
  • βœ… Unsupervised learning
  • βœ… Model training
  • βœ… Model evaluation
  • βœ… Scikit-learn library
  • βœ… Algorithm selection
  • βœ… Performance metrics

πŸ—„οΈ Databases

  • βœ… SQL queries (SELECT, JOIN)
  • βœ… Database design
  • βœ… CRUD operations
  • βœ… Aggregations & grouping
  • βœ… Subqueries
  • βœ… Views & indexes
  • βœ… Stored procedures
  • βœ… Query optimization

πŸ•·οΈ Web Scraping

  • βœ… HTTP protocol
  • βœ… HTML structure
  • βœ… CSS selectors
  • βœ… BeautifulSoup parsing
  • βœ… Requests library
  • βœ… Data extraction
  • βœ… Ethical scraping
  • βœ… Pipeline building

πŸ“ˆ Statistics

  • βœ… Probability theory
  • βœ… Distributions
  • βœ… Conditional probability
  • βœ… Bayes theorem
  • βœ… Hypothesis testing
  • βœ… Statistical inference
  • βœ… Sampling techniques
  • βœ… Error metrics

πŸ› οΈ Technology Stack

Core Technologies

Category Tools
πŸ’» Language Python 3.11+
πŸ“Š Data Analysis NumPy, Pandas
πŸ“ˆ Visualization Matplotlib, Seaborn
πŸ•ΈοΈ Web Scraping Requests, BeautifulSoup4
πŸ—„οΈ Database MySQL
πŸ€– Machine Learning Scikit-learn
πŸ““ IDE Jupyter Notebook, VS Code

πŸ“ˆ Progress Tracker

Use this checklist to track your learning journey:

Core Modules

  • πŸŽ“ Introduction to Data Science
  • 🐍 Python Fundamentals (18 notebooks)
  • πŸ”’ NumPy Mastery (5 notebooks)
  • 🐼 Pandas Deep Dive (2 notebooks)
  • πŸ“Š Data Visualization (8 notebooks)
  • πŸ•·οΈ Web Scraping (2 notebooks)
  • πŸ—„οΈ SQL & Databases (20 tutorials)
  • πŸ“ˆ Probability & Statistics
  • πŸ€– Machine Learning Introduction
  • πŸ”§ Scikit-learn Basics
  • πŸ“‹ ML Algorithm Types
  • 🎯 ML Practice (5+ notebooks)

Projects

  • 🌐 Coders of Delhi - Social Network
  • πŸ“š Book Data Scraper
  • 🌸 Iris Classification
  • πŸ“Š Data Analysis Projects

Milestones

  • πŸŽ–οΈ Completed first 50 notebooks
  • πŸ† Built 3 portfolio projects
  • πŸš€ Trained first ML model
  • ⭐ Contributed to the repo

🀝 Connect

Let's Learn Together!

LinkedIn GitHub Instagram

Questions? Suggestions? Want to collaborate?
Feel free to open an issue or reach out directly!


🀝 Contributing

We welcome contributions from the community! Here's how you can help:

Ways to Contribute

  • πŸ› Report Bugs: Found an error? Let us know!
  • πŸ’‘ Suggest Features: Have ideas for new content?
  • πŸ“ Improve Documentation: Help make explanations clearer
  • 🎨 Add Examples: Share your own projects and solutions
  • 🌐 Translate: Help make content accessible in other languages

How to Contribute

  1. Fork this repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

TL;DR: You can use, modify, and distribute this content freely. Attribution appreciated! πŸ™


⭐ Show Your Support

If this repository helped you in your data science journey:

  • ⭐ Star this repository
  • 🍴 Fork it for your own learning
  • πŸ“’ Share with fellow learners
  • πŸ’¬ Spread the word on social media

πŸ“Š Repository Stats

GitHub contributors GitHub last commit GitHub repo size


πŸ™ Acknowledgments

  • πŸŽ“ Inspired by various data science courses and bootcamps
  • πŸ“š Built with passion for the data science community
  • 🌟 Thanks to all contributors and learners

Made with ❀️ for Data Science Learners Worldwide

Happy Learning! πŸš€

Footer

About

A structured learning repository for Data Science using Python. Covers Data Cleaning, EDA, and visualization with Pandas, NumPy, Matplotlib, and Seaborn and more

Topics

Resources

License

Stars

Watchers

Forks

Contributors