Introduction to Data Science

Personal Information

Arafat Md Easin
Email: arafatmdeasin@inf.elte.hu
Room: 7.25 (North Block)

Please contact before you want to pay a visit

Packages

NumPy
matplotlib
pandas
sklearn
seaborn
mlxtend

Updated Course Calendar

Introduction to Data Science Practice Course Outline

Week 01: Python Environment Setup and IDEs
Week 02: Exploratory Data Analysis (EDA)
Week 03: Data Preprocessing and Dimensionality Reduction
Week 04: Feature Engineering and Extraction
Week 05: Mid Exam
Week 06: Model Development with Supervised Learning
Week 07: Model Development with Unsupervised Learning
Week 08: Model Evaluation and Validation
Week 09: Review Class and Counseling or Consultations
Week 10: Kaggle Competition Challenge

Python Environment Setup and IDEs

Overview of Python installation and package management
Popular IDEs and platforms:
- VSCode
- Jupyter Notebook
- Anaconda
- Google Colab
- Kaggle Kernels

Exploratory Data Analysis (EDA)

Loading and Inspecting Datasets:
- Utilizing pandas for data loading and inspection
Basic Data Visualization and Descriptive Statistics:
- Libraries:
  - matplotlib: Core plotting library
  - seaborn: High-level interface for statistical graphics
  - pandas: Built-in plotting capabilities

Data Preprocessing and Dimensionality Reduction

Data cleaning: handling missing values, outliers, and inconsistent formats
Data transformation: normalization, scaling, and encoding
Techniques for dimensionality reduction (e.g., PCA, t-SNE)

Feature Engineering and Extraction

Creating new features from raw data
Feature selection and importance ranking
Dealing with imbalanced datasets using resampling or algorithmic approaches

Mid Exam

Assessment covering foundational topics and hands-on exercises

Model Development with Supervised Learning

Classification:
- Binary Classification: Logistic Regression, Support Vector Machines (SVM), etc.
- Multiclass Classification: Decision Trees, Random Forests, etc.
- Ensemble Methods: Techniques like voting classifiers that combine multiple models for improved performance.
Regression:
- Linear Regression: Basic approach for predicting continuous outcomes.
- Tree-Based Regression: Decision Tree Regression, Random Forest Regression.
- Ensemble Methods for Regression: Techniques such as gradient boosting and bagging, along with cross-validation for hyperparameter tuning.

Model Development with Unsupervised Learning

Clustering techniques: K-Means, Hierarchical Clustering, DBSCAN
Dimensionality reduction techniques: PCA, t-SNE
Association rule mining and other unsupervised techniques

Model Evaluation and Validation

Evaluation metrics for classification (accuracy, precision, recall, F1, ROC-AUC) and regression (RMSE, MAE, R²)
Cross-validation techniques: KFold, StratifiedKFold, RepeatedKFold, Jackknife, etc.
Visualizing evaluation results (confusion matrices, ROC curves, residual plots)

Review Class and Counseling or Consultations

Discuss ML Challenges: Address any challenges encountered with ML models, including issues in deployment, interpretability, and performance.
Review Course Topics: Clarify doubts and reinforce understanding of key concepts.
Provide Guidance: Offer practical solutions and personalized advice for additional problems or projects.

Kaggle Competition Challenge

Hands-on project: Build and deploy a model to compete on a Kaggle dataset
Integration of techniques learned throughout the course

Note

Semester Project Group Work or Individual Presentation at the end of the semester.

Tip

Learn the data science-based Python materials to prepare for the final project. Select a data type from the listed dataset websites below. Create a group by yourself and think about the semester project.

Important

Presence is 100% MANDATORY; students must earn certain points to PASS the practice. Ten quizzes, at the beginning of each practical class Students can start the project work after the Spring break (Approximately can get 1 month to do this).

Warning

Students may miss at most 4 classes

Caution

If students miss more classes, they automatically FAIL the course!

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
Practice Materials		Practice Materials
code		code
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction to Data Science

Personal Information

Packages

Updated Course Calendar

Introduction to Data Science Practice Course Outline

Table of Contents

Python Environment Setup and IDEs

Exploratory Data Analysis (EDA)

Data Preprocessing and Dimensionality Reduction

Feature Engineering and Extraction

Mid Exam

Model Development with Supervised Learning

Model Development with Unsupervised Learning

Model Evaluation and Validation

Review Class and Counseling or Consultations

Kaggle Competition Challenge

Dataset Websites

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Introduction to Data Science

Personal Information

Packages

Updated Course Calendar

Introduction to Data Science Practice Course Outline

Table of Contents

Python Environment Setup and IDEs

Exploratory Data Analysis (EDA)

Data Preprocessing and Dimensionality Reduction

Feature Engineering and Extraction

Mid Exam

Model Development with Supervised Learning

Model Development with Unsupervised Learning

Model Evaluation and Validation

Review Class and Counseling or Consultations

Kaggle Competition Challenge

Dataset Websites

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages