Skip to content

vtsoulos/CEID-Data-Mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

COVID-19 Data Mining — CEID Project

The analysis uses a global COVID-19 dataset to demonstrate EDA, country-level feature engineering and clustering, and short-horizon forecasting.

Highlights

  • EDA: global trends, missingness checks, and distributions
  • Feature engineering: positivity rate, fatality rate, tests per population
  • Clustering: KMeans on scaled country-level metrics with interactive 3D visualization
  • Forecasting: SVR baseline for 3-day-ahead positivity-rate prediction (optional RNN/LSTM)

Quickstart

  1. Create a Python environment and install deps:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
  1. Launch the notebook server and open the presentation notebook:
jupyter lab  # or jupyter notebook
# open: src/project.ipynb

Project structure

Data-Mining/
  README.md
  requirements.txt
  src/
    data.csv
    project.ipynb        # cleaned, presentation-ready notebook
    project copy.ipynb   # original analysis notebook (kept intact)
    utils.py             # preprocessing, SVR/RNN helpers
    utils_visual.py      # plotting & cluster visualization helpers

About

COVID-19 dataset, EDA, country-level feature engineering and clustering, short-horizon forecasting

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors