The analysis uses a global COVID-19 dataset to demonstrate EDA, country-level feature engineering and clustering, and short-horizon forecasting.
Highlights
- EDA: global trends, missingness checks, and distributions
- Feature engineering: positivity rate, fatality rate, tests per population
- Clustering: KMeans on scaled country-level metrics with interactive 3D visualization
- Forecasting: SVR baseline for 3-day-ahead positivity-rate prediction (optional RNN/LSTM)
Quickstart
- Create a Python environment and install deps:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt- Launch the notebook server and open the presentation notebook:
jupyter lab # or jupyter notebook
# open: src/project.ipynbProject structure
Data-Mining/
README.md
requirements.txt
src/
data.csv
project.ipynb # cleaned, presentation-ready notebook
project copy.ipynb # original analysis notebook (kept intact)
utils.py # preprocessing, SVR/RNN helpers
utils_visual.py # plotting & cluster visualization helpers