Skip to content

AMRYB/Hotel-Booking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hotel Booking Cancellation Analysis

A Data Analysis course project that explores the main drivers behind hotel booking cancellations and produces actionable insights for the hospitality business.

Target variable: Is_Canceled (0 = not canceled, 1 = canceled)


📌 Project Goal

Hotel booking cancellations cause revenue loss and make planning harder (room allocation, staffing, forecasting). This project answers:

Which factors are most associated with booking cancellations, and how can hotels use data-driven insights to reduce cancellation risk?


📦 Dataset

  • Dataset name: Hotel Booking Demand
  • Source: Kaggle (jessemostipak / hotel-booking-demand)
  • Hotels: City Hotel & Resort Hotel (Portugal)
  • Time period: 2015–2017
  • Raw shape: ~119k rows × 32 columns (each row = one booking)

The dataset file used in this repo is included as: hotel_bookings.csv


🧠 What’s Inside (Methods)

1) Data Cleaning & Preprocessing

  • Handling missing values (median/mode/mean depending on the feature type)
  • Removing low-quality / inconsistent records (e.g., invalid ADR values)
  • Dropping highly-missing columns (e.g., Company)
  • Standardizing column names and data types

2) Exploratory Data Analysis (EDA)

  • Distributions and relationships for key variables such as:
    • Lead time, ADR (price), stay duration, guests
    • Market segments, customer types, seasonality

3) Feature Engineering

Examples of engineered features:

  • Total_Nights = Stays_In_Weekend_Nights + Stays_In_Week_Nights
  • Total_Guests = Adults + Children + Babies
  • Season/long-stay indicators
  • Revenue-related features

4) Feature Selection

Multiple approaches were used to identify the most relevant predictors:

  • Correlation analysis
  • Lasso regression
  • Recursive Feature Elimination (RFE) with Logistic Regression

5) Statistical Testing

To validate whether differences are statistically meaningful:

  • Normality checks
  • Mann–Whitney U test for numeric features
  • Chi-square test for categorical features

6) Dimensionality Reduction & Clustering

  • PCA to reduce dimensionality for customer-behavior features
  • K-Means clustering to group customers with similar booking behavior

📁 Repository Structure

.
├── Hotel_Analysis.ipynb        # Main notebook (analysis + plots + modeling steps)
├── hotel_bookings.csv          # Dataset used in the notebook
├── Documention_DA.pdf          # Full project documentation/report
└── Presentation DA.pdf         # Project presentation slides

🚀 Getting Started

Option A — Run locally

  1. Create a virtual environment (recommended)
  2. Install dependencies:
pip install numpy pandas matplotlib seaborn scipy scikit-learn jupyter
  1. Launch Jupyter and open the notebook:
jupyter notebook
  1. Run Hotel_Analysis.ipynb from top to bottom.

Option B — Run in Google Colab

Upload Hotel_Analysis.ipynb and hotel_bookings.csv to Colab and run the cells.


✅ Outputs

  • Business-focused insights about cancellation drivers
  • Statistical evidence of significant relationships
  • Customer segments (clusters) based on booking behavior
  • A full written report and a presentation deck (PDFs)

📝 Notes

  • This repository focuses on analysis + insights + segmentation.
    If you want an end-to-end prediction model (train/test metrics), you can extend the notebook with a classification pipeline and evaluation metrics.

🙏 Acknowledgments

  • Dataset provider: Hotel Booking Demand on Kaggle (as credited above)

About

Data Analysis course project: Hotel Booking Cancellation Analysis (EDA, statistical testing, PCA, and clustering).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors