Integrated Learning Experience (ILE) Project Liam Guest | Tulane University School of Public Health and Tropical Medicine December 2025
This repository contains the complete analysis and paper for an explainable machine learning study on hurricane vulnerability assessment across Gulf Coast communities. The research predicts disaster assistance eligibility using social vulnerability indicators and evaluates systematic bias across demographic groups.
Key Findings:
- Binary classification model: 72.8% accuracy, AUC-ROC=0.644
- Systematic bias identified: 55-90% higher prediction errors in high-vulnerability communities
- Three distinct vulnerability typologies requiring tailored interventions
- Data leakage detection and correction demonstrates methodological rigor
AURA-ILE/
├── paper/ # Paper drafts and documentation
│ ├── ILE_DRAFT_1_UPDATED.md # Main paper (Markdown)
│ ├── ILE_DRAFT_1_UPDATED.docx # Main paper (Word)
│ ├── MODEL_COMPARISON_SUMMARY.md # Model evolution documentation
│ └── ILE_SUMMARY.md # Technical project overview
├── data/
│ └── tract_storm_features.csv # Dataset used for training (5,668 tract-disaster pairs)
├── scripts/
│ ├── train_binary_classification.py # Final model training
│ ├── fairness_diagnostics.py # Equity analysis
│ └── clustering_vulnerability_typologies.py # K-means clustering
├── results/
│ ├── binary_model_performance.csv # Model metrics
│ ├── fairness_metrics.csv # Demographic disparities
│ ├── fairness_subgroup_performance.csv # Intersectional analysis
│ ├── shap_feature_importance.csv # SHAP results (leaky model)
│ └── shap_values.csv
└── figures/
├── fairness_error_by_group.png # KEY: Demographic bias visualization
├── fairness_income_race_interaction.png # KEY: Intersectional analysis
├── clustering_typology_profiles.png # KEY: Vulnerability typologies
├── clustering_pca_visualization.png
├── clustering_geographic_distribution.png
└── clustering_elbow_silhouette.png
Current emergency management systems lack transparent, equitable, and predictive tools for hurricane vulnerability assessment, limiting proactive resource allocation to at-risk communities.
- Data: 5,668 census tract-disaster observations (Harvey, Irma, Michael, Laura, Ida)
- Models: Binary classification (KNN, Decision Tree, Random Forest, Bagging)
- Features: 18 social vulnerability indicators (demographics, economics, housing)
- Target: Whether a tract receives ANY disaster assistance (yes/no)
- Model Performance: 72.8% accuracy, AUC-ROC=0.644 (Bagging Classifier)
- Fairness Concerns:
- High-Black tracts: 90% higher prediction error
- Low-income tracts: 55% higher prediction error
- Intersectional vulnerability: 4-fold error disparity
- Vulnerability Typologies: 3 clusters (62.5% moderate, 20.4% high, 17.1% low vulnerability)
- Mandate algorithmic fairness audits for disaster assistance systems
- Proactive multilingual outreach to under-engaged communities
- Tailored interventions based on vulnerability typologies
- Community-based validation of algorithmic predictions
- Main Paper:
paper/ILE_DRAFT_1_UPDATED.docx - Model Evolution:
paper/MODEL_COMPARISON_SUMMARY.md(explains data leakage journey)
- Project Overview:
paper/ILE_SUMMARY.md - Model Performance:
results/binary_model_performance.csv - Fairness Analysis:
results/fairness_metrics.csv
figures/fairness_error_by_group.png- Demographic biasfigures/fairness_income_race_interaction.png- Intersectionalityfigures/clustering_typology_profiles.png- Vulnerability typologies
File: data/tract_storm_features.csv
- Observations: 5,668 census tract-disaster pairs
- Hurricanes: Harvey (2017), Irma (2017), Michael (2018), Laura (2020), Ida (2021)
- States: TX, LA, MS, AL, FL
- Features: 18 social vulnerability indicators
- Target:
has_claims(binary: 1 if fema_claims_total > 0, else 0) - Class Distribution: 72.5% no claims, 27.5% has claims
- Economic: median_household_income, pct_poverty
- Demographics: pct_elderly, pct_children, pct_black, pct_hispanic, pct_limited_english
- Housing: pct_mobile_homes, pct_multi_unit, pct_crowded_housing, pct_no_vehicle
- Context: population, housing_units, area_sq_mi, disaster number, state
- Algorithms: KNN, Decision Tree, Random Forest, Bagging
- Hyperparameter Tuning: GridSearchCV with 5-fold cross-validation
- Train/Test Split: Stratified 80/20 (maintains class balance)
- Best Model: Bagging Classifier (n_estimators=30, max_samples=1.0)
- Protected Attributes: Race (% Black), income, poverty, housing vulnerability
- Metrics: MAE by group, mean prediction error, intersectional analysis
- Threshold: MAE variation >20% indicates significant disparity
- Method: K-Means with silhouette score optimization
- Features: 17 vulnerability indicators
- Optimal k: 3 clusters
- Output: Vulnerability typologies with distinct socioeconomic profiles
python 3.9+
pandas
numpy
scikit-learn
matplotlib
seabornpython scripts/train_binary_classification.pypython scripts/fairness_diagnostics.pypython scripts/clustering_vulnerability_typologies.py- Data Analysis: Binary classification, hyperparameter tuning, data leakage detection
- Policy Implications: Equity audits, multilingual outreach, community-based validation
- Descriptive/Inferential Methods: Stratified sampling, cross-validation, fairness diagnostics
- Probability/Statistics: Zero-inflated distributions, ensemble methods, proper evaluation metrics
Guest, L. (2025). Explainable Machine Learning for Hurricane Vulnerability Assessment: Equity Implications in Disaster Assistance Prediction. Tulane University School of Public Health and Tropical Medicine, Integrated Learning Experience.
Liam Guest Tulane University School of Public Health and Tropical Medicine Email: lguest@tulane.edu
Data sources:
- FEMA OpenFEMA: Individual Assistance Housing Registrants
- CDC/ATSDR: Social Vulnerability Index 2022
- U.S. Census Bureau: American Community Survey
This ILE project is part of the larger AURA (AI for Urban Resilience & Alerts) research initiative.