Skip to content

arafatro/IntroToDS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 

Repository files navigation

Introduction to Data Science

Personal Information

Arafat Md Easin
Email: arafatmdeasin@inf.elte.hu
Room: 7.25 (North Block)

Please contact before you want to pay a visit

Packages

  • NumPy
  • matplotlib
  • pandas
  • sklearn
  • seaborn
  • mlxtend

Updated Course Calendar

Introduction to Data Science Practice Course Outline

Table of Contents

  1. Week 01: Python Environment Setup and IDEs
  2. Week 02: Exploratory Data Analysis (EDA)
  3. Week 03: Data Preprocessing and Dimensionality Reduction
  4. Week 04: Feature Engineering and Extraction
  5. Week 05: Mid Exam
  6. Week 06: Model Development with Supervised Learning
  7. Week 07: Model Development with Unsupervised Learning
  8. Week 08: Model Evaluation and Validation
  9. Week 09: Review Class and Counseling or Consultations
  10. Week 10: Kaggle Competition Challenge

Python Environment Setup and IDEs

  • Overview of Python installation and package management
  • Popular IDEs and platforms:
    • VSCode
    • Jupyter Notebook
    • Anaconda
    • Google Colab
    • Kaggle Kernels

Exploratory Data Analysis (EDA)

  • Loading and Inspecting Datasets:
    • Utilizing pandas for data loading and inspection
  • Basic Data Visualization and Descriptive Statistics:
    • Libraries:
      • matplotlib: Core plotting library
      • seaborn: High-level interface for statistical graphics
      • pandas: Built-in plotting capabilities

Data Preprocessing and Dimensionality Reduction

  • Data cleaning: handling missing values, outliers, and inconsistent formats
  • Data transformation: normalization, scaling, and encoding
  • Techniques for dimensionality reduction (e.g., PCA, t-SNE)

Feature Engineering and Extraction

  • Creating new features from raw data
  • Feature selection and importance ranking
  • Dealing with imbalanced datasets using resampling or algorithmic approaches

Mid Exam

  • Assessment covering foundational topics and hands-on exercises

Model Development with Supervised Learning

  • Classification:
    • Binary Classification: Logistic Regression, Support Vector Machines (SVM), etc.
    • Multiclass Classification: Decision Trees, Random Forests, etc.
    • Ensemble Methods: Techniques like voting classifiers that combine multiple models for improved performance.
  • Regression:
    • Linear Regression: Basic approach for predicting continuous outcomes.
    • Tree-Based Regression: Decision Tree Regression, Random Forest Regression.
    • Ensemble Methods for Regression: Techniques such as gradient boosting and bagging, along with cross-validation for hyperparameter tuning.

Model Development with Unsupervised Learning

  • Clustering techniques: K-Means, Hierarchical Clustering, DBSCAN
  • Dimensionality reduction techniques: PCA, t-SNE
  • Association rule mining and other unsupervised techniques

Model Evaluation and Validation

  • Evaluation metrics for classification (accuracy, precision, recall, F1, ROC-AUC) and regression (RMSE, MAE, R²)
  • Cross-validation techniques: KFold, StratifiedKFold, RepeatedKFold, Jackknife, etc.
  • Visualizing evaluation results (confusion matrices, ROC curves, residual plots)

Review Class and Counseling or Consultations

  • Discuss ML Challenges: Address any challenges encountered with ML models, including issues in deployment, interpretability, and performance.
  • Review Course Topics: Clarify doubts and reinforce understanding of key concepts.
  • Provide Guidance: Offer practical solutions and personalized advice for additional problems or projects.

Kaggle Competition Challenge

  • Hands-on project: Build and deploy a model to compete on a Kaggle dataset
  • Integration of techniques learned throughout the course

Note

Semester Project Group Work or Individual Presentation at the end of the semester.

Tip

Learn the data science-based Python materials to prepare for the final project. Select a data type from the listed dataset websites below. Create a group by yourself and think about the semester project.

Important

Presence is 100% MANDATORY; students must earn certain points to PASS the practice. Ten quizzes, at the beginning of each practical class Students can start the project work after the Spring break (Approximately can get 1 month to do this).

Warning

Students may miss at most 4 classes

Caution

If students miss more classes, they automatically FAIL the course!

Dataset Websites

About

The "IntroToDS" repository, maintained by Arafat Md Easin, is a comprehensive resource for an Introduction to Data Science Practice.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors