Cultural Proximity Analytics: Short-Term Rentals and Cultural Hotspots 📊

Reproducible workflow to compute distances (Haversine) and analyze correlations between host variables and proximity to cultural hotspots

Short-term rentals (hosts/listings) -> distances to cultural hotspots -> correlation matrices (Pearson)

🔗 Quick Links

📋 Table of Contents

Overview
Scope and Features
Installation & Requirements
Quick Start
Usage Guide
Data Formats
Project Architecture
Methodology
Dataset Structure
Performance Benchmarks
Results & Visualizations
Research Team
Industry Partners Supporting Innovation
Citation & License
Acknowledgments
Contact
FAQ

🌟 Overview

This repository implements an analysis workflow to:

Compute geographical distances from short-term rental listings to a set of cultural hotspots in a city using the Haversine formula (km).
Generate an output dataset with all distances appended and ready for analysis.
Compute Pearson correlations between a target host variable (e.g., superhost, listings count, or a host-level derived count) and distances to cultural hotspots.
Export results as correlation matrices (PNG/PDF) and print correlation values to the console.

Included in this repository:

Correlations.py: runnable script (full pipeline).
Correlations.ipynb: explanatory notebook using the same workflow.

🔧 Key Capabilities

📐 Distance computation: Haversine distances (km) from each listing to multiple hotspots.
📊 Correlation analysis: Pearson correlation for existing variables and derived count-based variables.
🖼️ Publication-ready outputs: heatmap correlation matrices exported to PNG and PDF.
💾 Reproducible artifacts: CSV outputs + .tar.gz compression for portability.

🔬 Typical Use Cases

Area	Question	Output
Urban analytics 🏙️	Are high-volume hosts closer to cultural hotspots?	Correlations across distances
Tourism studies 🏛️	Which hotspots show the strongest proximity effects?	Ranked correlation values
Spatial data science 🧭	How do host attributes relate to cultural accessibility?	Heatmaps + summary statistics

✨ Scope and Features

📐 Distances (Haversine)

Reads a listings/hosts dataset with latitude/longitude.
Reads a cultural hotspots dataset with name/latitude/longitude.
Adds one distance column per hotspot using the d_t_... prefix (km).
Filters relevant columns, maps host_is_superhost to 0/1, and removes missing rows.
Exports the final CSV and a compressed .tar.gz copy.

📊 Correlations

Computes Pearson correlations between:
- An existing variable (e.g., host_is_superhost, host_total_listings_count) and distances, or
- A derived variable based on counts (e.g., listing_count by host_id) and distances.
Generates and saves a correlation matrix (PNG and PDF) using Seaborn/Matplotlib.

📦 Installation & Requirements

💻 System Requirements

Python 3.x
(Optional) Jupyter to run the notebook

📋 Dependencies

pip install numpy pandas seaborn matplotlib haversine

Note: tarfile is part of Python’s standard library (you do not install it via pip).

✅ Installation Verification

python -c "import numpy, pandas, seaborn, matplotlib, haversine; print('Installation successful!')"

🚀 Quick Start

Step	What to do
1) Prepare data	Extract the CSV files included in `Information/`: `tar -xzf Information/hosts.tar.gz -C Information tar -xzf Information/cultural_places.tar.gz -C Information`
2) Run	`python Correlations.py`
3) Check outputs	Files are generated under `Results/`: `hosts_with_distances_cultural.csv` (and `.tar.gz`) + matrices `Correlation_Matrix_*.png/.pdf`.

📖 Usage Guide

⚙️ Running the pipeline (script)

The default flow in Correlations.py performs:

Distance computation:
- Inputs: Information/hosts.csv, Information/cultural_places.csv
- Output: Results/hosts_with_distances_cultural.csv (+ compression)
Three correlation experiments:
- Test 1: host_is_superhost
- Test 2: host_total_listings_count
- Test 3: host_id (transformed into listing_count per host)

🧪 Using as a library (functions)

from Correlations import Distances, correlations_existing_variable, correlations_new_variable

Distances(
    hosts="Information/hosts.csv",
    places="Information/cultural_places.csv",
    result="Results/hosts_with_distances_cultural.csv",
)

correlations_existing_variable(
    filename="Results/hosts_with_distances_cultural.csv",
    variable="host_is_superhost",
    matrix_path="Results/Correlation_Matrix_1",
)

🗄️ Data Formats

🏠 `hosts.csv` (listings / hosts)

Must include, at minimum:

latitude, longitude
Host variables to analyze, for example: host_is_superhost, host_total_listings_count, host_id

The script also uses (and/or preserves) identification columns like id, host_url, host_name, etc.

🏛️ `cultural_places.csv` (cultural hotspots)

Must include:

place_name: name used to build d_t_<place_name> columns
latitude, longitude

📂 Project Architecture

Cultural-Proximity-Analytics/
├── Correlations.py                     # Pipeline: distances + correlations + exports
├── Correlations.ipynb                  # Explanatory notebook
├── Information/                        # Input datasets (tar.gz containing CSV)
│   ├── hosts.tar.gz
│   └── cultural_places.tar.gz
├── Results/                            # Outputs (CSV, PNG, PDF)
│   ├── hosts_with_distances_cultural.csv.tar.gz
│   ├── Correlation_Matrix_1.png/.pdf
│   ├── Correlation_Matrix_2.png/.pdf
│   └── Correlation_Matrix_3.png/.pdf
├── docs/                               # Project assets (logo, figures, team)
└── LICENSE

📚 Methodology

📐 Geographical distance (Haversine)

For each listing with coordinates $(\phi_1, \lambda_1)$ and cultural hotspot $(\phi_2, \lambda_2)$, the distance in km is computed using the Haversine formula (implemented by the haversine library).

📊 Correlations (Pearson)

Let $X$ be a host variable (existing or derived) and $D_k$ the distance to hotspot $k$. We compute:

$\rho(X, D_k)$ using Pearson correlation.

The full matrix is visualized as a heatmap and exported to PNG/PDF.

🗄️ Dataset Structure

This repository ships example inputs as compressed archives under Information/:

Information/
├── hosts.tar.gz              # Contains hosts.csv
└── cultural_places.tar.gz    # Contains cultural_places.csv

Outputs are written to Results/ when running the pipeline:

Results/
├── hosts_with_distances_cultural.csv.tar.gz
├── Correlation_Matrix_1.png/.pdf
├── Correlation_Matrix_2.png/.pdf
└── Correlation_Matrix_3.png/.pdf

README assets are stored under docs/ (logo, figures, team photos).

📈 Performance Benchmarks

The pipeline is designed for straightforward batch analysis. Runtime depends on the number of listings and hotspots.

⏱️ Scaling Overview

Stage	Core operation	Typical scaling
Distance computation	Haversine per listing × hotspot	~ proportional to (N listings × M hotspots)
Correlations	Pearson correlations on numeric columns	~ proportional to number of rows and variables
Visualization/export	Heatmap rendering + file writes	~ proportional to matrix size

🎥 Results & Visualizations

🖼️ Examples Gallery

Correlation matrices generated by the included example run (stored under docs/figures/ for README rendering):

Test 1
_{Variable: host_is_superhost}

PDF · PNG Test 2
_{Variable: host_total_listings_count}

PDF · PNG Test 3
_{Derived: listing_count per host_id}

PDF · PNG

🧑‍🔬 Research Team

🌟 Meet the Team

Researchers and students contributing to this project

👥 Main Researchers

Photo	Researcher	Affiliation	Contact
	Dr. Gerardo Tinoco-Guerrero 🇲🇽 _{Geospatial Analytics & Computational Methods}
	Dr. José Alberto Guzmán-Torres 🇲🇽 _{Urban Analytics & Data-Driven Modeling}
	Dr. Narciso Salvador Tinoco-Guerrero 🇲🇽 _{Statistical Modeling & Applied Research}

🤝 Collaborators

Photo	Collaborator	Affiliation	Contact
	Dr. Francisco Javier Domínguez-Mota 🇲🇽 _{Applied Mathematics & Scientific Computing}
	Dr. Heriberto Árias-Rojas 🇲🇽 _{Engineering Applications}

🎓 Ph.D. Research Students

Photo	Student	Institution	Contact
	Gabriela Pedraza-Jiménez
	Eli Chagolla-Inzunza

🎓 M.Sc. Research Students

Photo	Student	Institution	Contact
	Jorge L. González-Figueroa
	Christopher N. Magaña-Barocio

🎓 Undergraduate Research Students

Photo	Student	Institution	Contact
	Maria Goretti Fraga-Lopez

🏭 Industry Partners Supporting Innovation

🌟 Industry Partners Supporting Innovation

Collaboration between academia and industry to accelerate real-world impact

🏭 SIIIA MATH

Soluciones en Ingeniería, Mexico

🎯 Focus areas:

Mathematical modeling & simulation
AI/ML engineering solutions
Technology transfer and applied R&D

📝 Citation & License

📄 License

This project is distributed under the MIT License.

🔖 Suggested citation

@software{tinoco2024citc_airbnb,
  title        = {Cultural Proximity Analytics: Distance and correlation analysis (short-term rentals and cultural hotspots)},
  author       = {Tinoco-Guerrero, Gerardo and Guzmán-Torres, José Alberto and Tinoco-Guerrero, Narciso Salvador},
  year         = {2024},
  institution  = {Universidad Michoacana de San Nicolás de Hidalgo},
  note         = {Geographical distance computation (Haversine) and correlation analysis (Pearson) between host variables and proximity to cultural hotspots.}
}

🙏 Acknowledgments

❤️ Special Thanks

We extend our gratitude to the institutions and partners supporting this research

🏛️ Institutional Support

🎓 Universidad Michoacana de San Nicolás de Hidalgo (UMSNH) _{Academic institution, Mexico} Key support Research infrastructure and academic environment Scientific training and supervision	💰 CONAHCyT _{National Council of Humanities, Sciences and Technologies, Mexico} Key support Research funding and scientific development Support for open research outputs
🌿 Centre Internacional de Mètodes Numèrics en Enginyeria (CIMNE) _{Research center, Spain} Key support International collaboration and research environment Academic and applied computing exchange	🏭 SIIIA MATH: Soluciones en Ingeniería _{Industry partner, Mexico} Key support Applied R&D perspective and real-world relevance Industry-academia collaboration

:building_with_garden: Research Centers & Collaborations

🌿 Aula CIMNE-Morelia
_{Research collaboration space}

Collaboration highlights

Numerical methods and computational engineering environment
Scientific collaboration and training activities

🎓 UMSNH
_{Academic collaboration}

Collaboration highlights

Institutional support for research and training
Graduate formation and supervision for scientific computing

📧 Contact & Support

Contact channels, technical support, and collaboration opportunities

Primary Contact
_{Research coordination}

Dr. Gerardo Tinoco-Guerrero
_{Morelia, Michoacán, Mexico}

Technical Support
_{Bug reports, questions, and collaboration requests}

Issues for bugs and feature requests
Email for technical inquiries
Collaboration for partnerships and joint projects

Collaboration Opportunities
_{Research and engineering partnerships}

🗺️ Geospatial Analytics _{distance-to-hotspot computation, spatial feature engineering}	📍 Urban & Cultural Data _{cultural hotspots, accessibility, spatial indicators}
📈 Spatial Statistics _{correlation analysis, robustness checks, interpretation}	🧰 Reproducible Pipelines _{open datasets, scripts, and publication-ready outputs}
🏙️ Tourism & Mobility _{proximity effects, hospitality patterns, urban studies}

Student Opportunities
_{Projects and training in data science and geospatial analytics}

Graduate Programs: research opportunities with the team
Undergraduate Projects: thesis topics in urban and spatial data science
Internships: analytics workflows, visualization, and reproducible research

Institutional Affiliations

💬 FAQ

Do I need Jupyter?

No. You can run the full pipeline with python Correlations.py. Jupyter is only needed for the notebook.

Why are the example inputs packaged as .tar.gz files?

To keep example datasets compact and easy to distribute. Extract them into Information/ before running the script.

How do I add a new cultural hotspot?

Add a row to Information/places.csv with name, latitude, longitude. The pipeline will generate a new distance column (d_t_...) automatically.

Advancing reproducible geospatial analytics through open-source collaboration

If this project helps your research, please consider giving it a star.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Information		Information
Results		Results
docs		docs
.gitignore		.gitignore
Correlations.ipynb		Correlations.ipynb
Correlations.py		Correlations.py
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation