Arafat Md Easin
Email: arafatmdeasin@inf.elte.hu
Room: 7.25 (North Block)
Please contact before you want to pay a visit
- NumPy
- matplotlib
- pandas
- sklearn
- seaborn
- mlxtend
- Week 01: Python Environment Setup and IDEs
- Week 02: Exploratory Data Analysis (EDA)
- Week 03: Data Preprocessing and Dimensionality Reduction
- Week 04: Feature Engineering and Extraction
- Week 05: Mid Exam
- Week 06: Model Development with Supervised Learning
- Week 07: Model Development with Unsupervised Learning
- Week 08: Model Evaluation and Validation
- Week 09: Review Class and Counseling or Consultations
- Week 10: Kaggle Competition Challenge
- Overview of Python installation and package management
- Popular IDEs and platforms:
- VSCode
- Jupyter Notebook
- Anaconda
- Google Colab
- Kaggle Kernels
- Loading and Inspecting Datasets:
- Utilizing
pandasfor data loading and inspection
- Utilizing
- Basic Data Visualization and Descriptive Statistics:
- Libraries:
matplotlib: Core plotting libraryseaborn: High-level interface for statistical graphicspandas: Built-in plotting capabilities
- Libraries:
- Data cleaning: handling missing values, outliers, and inconsistent formats
- Data transformation: normalization, scaling, and encoding
- Techniques for dimensionality reduction (e.g., PCA, t-SNE)
- Creating new features from raw data
- Feature selection and importance ranking
- Dealing with imbalanced datasets using resampling or algorithmic approaches
- Assessment covering foundational topics and hands-on exercises
- Classification:
- Binary Classification: Logistic Regression, Support Vector Machines (SVM), etc.
- Multiclass Classification: Decision Trees, Random Forests, etc.
- Ensemble Methods: Techniques like voting classifiers that combine multiple models for improved performance.
- Regression:
- Linear Regression: Basic approach for predicting continuous outcomes.
- Tree-Based Regression: Decision Tree Regression, Random Forest Regression.
- Ensemble Methods for Regression: Techniques such as gradient boosting and bagging, along with cross-validation for hyperparameter tuning.
- Clustering techniques: K-Means, Hierarchical Clustering, DBSCAN
- Dimensionality reduction techniques: PCA, t-SNE
- Association rule mining and other unsupervised techniques
- Evaluation metrics for classification (accuracy, precision, recall, F1, ROC-AUC) and regression (RMSE, MAE, R²)
- Cross-validation techniques: KFold, StratifiedKFold, RepeatedKFold, Jackknife, etc.
- Visualizing evaluation results (confusion matrices, ROC curves, residual plots)
- Discuss ML Challenges: Address any challenges encountered with ML models, including issues in deployment, interpretability, and performance.
- Review Course Topics: Clarify doubts and reinforce understanding of key concepts.
- Provide Guidance: Offer practical solutions and personalized advice for additional problems or projects.
- Hands-on project: Build and deploy a model to compete on a Kaggle dataset
- Integration of techniques learned throughout the course
Note
Semester Project Group Work or Individual Presentation at the end of the semester.
Tip
Learn the data science-based Python materials to prepare for the final project. Select a data type from the listed dataset websites below. Create a group by yourself and think about the semester project.
Important
Presence is 100% MANDATORY; students must earn certain points to PASS the practice. Ten quizzes, at the beginning of each practical class Students can start the project work after the Spring break (Approximately can get 1 month to do this).
Warning
Students may miss at most 4 classes
Caution
If students miss more classes, they automatically FAIL the course!