A collection of data science notebooks focused on data cleaning, preprocessing, and exploratory data analysis (EDA) using real-world datasets.
This project is developed as part of Daily Code 2026, with an emphasis on hands-on learning, reproducibility, and clear analytical thinking.
This repository contains Jupyter notebooks that demonstrate practical approaches to preparing, cleaning, and understanding datasets before modeling or deployment.
Each notebook focuses on a specific learning objective rather than a full production-ready pipeline.
- Practice real-world data cleaning techniques
- Perform exploratory data analysis (EDA)
- Visualize datasets to uncover trends and patterns
- Build readable and reproducible Jupyter notebooks
- Strengthen Python-based data analysis fundamentals
File: Data_Cleaning.ipynb
- Inspect raw datasets
- Handle missing, duplicate, and inconsistent values
- Normalize and validate data for further analysis
File: VisualRepresentation.ipynb
- Explore datasets using visualization techniques
- Generate plots and charts to identify patterns
- Convert raw data into meaningful insights
Typical steps followed across notebooks:
- Data loading and inspection
- Handling missing or invalid values
- Data cleaning and preprocessing
- Exploratory data analysis (EDA)
- Visualization and interpretation
- Exploratory Data Analysis (EDA)
- Data cleaning and preprocessing
- Pandas DataFrame operations
- NumPy-based numerical computations
- Data visualization fundamentals
cleaned-data/
├── cleaned_order_items.csv
├── cleaned_orders.csv
├── cleaned_pageviews.csv
├── cleaned_products.csv
├── cleaned_refunds.csv
└── cleaned_sessions.csv
notebooks/
├── Data_Cleaning.ipynb
└── VisualRepresentation.ipynb
raw-data/
├── maven_fuzzy_factory_data_dictionary.csv
├── order_item_refunds.csv
├── order_items.csv
├── orders.csv
├── products.csv
├── website_pageviews.csv
└── website_sessions.csv
README.md
requirements.txt