Skip to content

gstinoco/Cultural_Proximity_Analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cultural Proximity Analytics: Short-Term Rentals and Cultural Hotspots 📊

Cultural Proximity Analytics logo

GitHub Python pandas NumPy Seaborn Matplotlib haversine License: MIT

Reproducible workflow to compute distances (Haversine) and analyze correlations between host variables and proximity to cultural hotspots

Short-term rentals (hosts/listings) -> distances to cultural hotspots -> correlation matrices (Pearson)

🔗 Quick Links

🚀 Quick Start 📦 Install 🗂️ Dataset 🧭 Formats 📈 Benchmarks 🎬 Visualizations 👥 Team 🏭 Partners  Thanks 📧 Contact


📋 Table of Contents


🌟 Overview

This repository implements an analysis workflow to:

  • Compute geographical distances from short-term rental listings to a set of cultural hotspots in a city using the Haversine formula (km).
  • Generate an output dataset with all distances appended and ready for analysis.
  • Compute Pearson correlations between a target host variable (e.g., superhost, listings count, or a host-level derived count) and distances to cultural hotspots.
  • Export results as correlation matrices (PNG/PDF) and print correlation values to the console.

Included in this repository:

🔧 Key Capabilities

  • 📐 Distance computation: Haversine distances (km) from each listing to multiple hotspots.
  • 📊 Correlation analysis: Pearson correlation for existing variables and derived count-based variables.
  • 🖼️ Publication-ready outputs: heatmap correlation matrices exported to PNG and PDF.
  • 💾 Reproducible artifacts: CSV outputs + .tar.gz compression for portability.

🔬 Typical Use Cases

Area Question Output
Urban analytics 🏙️ Are high-volume hosts closer to cultural hotspots? Correlations across distances
Tourism studies 🏛️ Which hotspots show the strongest proximity effects? Ranked correlation values
Spatial data science 🧭 How do host attributes relate to cultural accessibility? Heatmaps + summary statistics

✨ Scope and Features

📐 Distances (Haversine)

  • Reads a listings/hosts dataset with latitude/longitude.
  • Reads a cultural hotspots dataset with name/latitude/longitude.
  • Adds one distance column per hotspot using the d_t_... prefix (km).
  • Filters relevant columns, maps host_is_superhost to 0/1, and removes missing rows.
  • Exports the final CSV and a compressed .tar.gz copy.

📊 Correlations

  • Computes Pearson correlations between:
    • An existing variable (e.g., host_is_superhost, host_total_listings_count) and distances, or
    • A derived variable based on counts (e.g., listing_count by host_id) and distances.
  • Generates and saves a correlation matrix (PNG and PDF) using Seaborn/Matplotlib.

📦 Installation & Requirements

💻 System Requirements

  • Python 3.x
  • (Optional) Jupyter to run the notebook

📋 Dependencies

pip install numpy pandas seaborn matplotlib haversine

Note: tarfile is part of Python’s standard library (you do not install it via pip).

✅ Installation Verification

python -c "import numpy, pandas, seaborn, matplotlib, haversine; print('Installation successful!')"

🚀 Quick Start

Step What to do
1) Prepare data Extract the CSV files included in Information/:
tar -xzf Information/hosts.tar.gz -C Information
tar -xzf Information/cultural_places.tar.gz -C Information
2) Run
python Correlations.py
3) Check outputs Files are generated under Results/:
hosts_with_distances_cultural.csv (and .tar.gz) + matrices Correlation_Matrix_*.png/.pdf.

📖 Usage Guide

⚙️ Running the pipeline (script)

The default flow in Correlations.py performs:

  1. Distance computation:
    • Inputs: Information/hosts.csv, Information/cultural_places.csv
    • Output: Results/hosts_with_distances_cultural.csv (+ compression)
  2. Three correlation experiments:
    • Test 1: host_is_superhost
    • Test 2: host_total_listings_count
    • Test 3: host_id (transformed into listing_count per host)

🧪 Using as a library (functions)

from Correlations import Distances, correlations_existing_variable, correlations_new_variable

Distances(
    hosts="Information/hosts.csv",
    places="Information/cultural_places.csv",
    result="Results/hosts_with_distances_cultural.csv",
)

correlations_existing_variable(
    filename="Results/hosts_with_distances_cultural.csv",
    variable="host_is_superhost",
    matrix_path="Results/Correlation_Matrix_1",
)

🗄️ Data Formats

🏠 hosts.csv (listings / hosts)

Must include, at minimum:

  • latitude, longitude
  • Host variables to analyze, for example: host_is_superhost, host_total_listings_count, host_id

The script also uses (and/or preserves) identification columns like id, host_url, host_name, etc.

🏛️ cultural_places.csv (cultural hotspots)

Must include:

  • place_name: name used to build d_t_<place_name> columns
  • latitude, longitude

📂 Project Architecture

Cultural-Proximity-Analytics/
├── Correlations.py                     # Pipeline: distances + correlations + exports
├── Correlations.ipynb                  # Explanatory notebook
├── Information/                        # Input datasets (tar.gz containing CSV)
│   ├── hosts.tar.gz
│   └── cultural_places.tar.gz
├── Results/                            # Outputs (CSV, PNG, PDF)
│   ├── hosts_with_distances_cultural.csv.tar.gz
│   ├── Correlation_Matrix_1.png/.pdf
│   ├── Correlation_Matrix_2.png/.pdf
│   └── Correlation_Matrix_3.png/.pdf
├── docs/                               # Project assets (logo, figures, team)
└── LICENSE

📚 Methodology

📐 Geographical distance (Haversine)

For each listing with coordinates $(\phi_1, \lambda_1)$ and cultural hotspot $(\phi_2, \lambda_2)$, the distance in km is computed using the Haversine formula (implemented by the haversine library).

📊 Correlations (Pearson)

Let $X$ be a host variable (existing or derived) and $D_k$ the distance to hotspot $k$. We compute:

  • $\rho(X, D_k)$ using Pearson correlation.

The full matrix is visualized as a heatmap and exported to PNG/PDF.


🗄️ Dataset Structure

This repository ships example inputs as compressed archives under Information/:

Information/
├── hosts.tar.gz              # Contains hosts.csv
└── cultural_places.tar.gz    # Contains cultural_places.csv

Outputs are written to Results/ when running the pipeline:

Results/
├── hosts_with_distances_cultural.csv.tar.gz
├── Correlation_Matrix_1.png/.pdf
├── Correlation_Matrix_2.png/.pdf
└── Correlation_Matrix_3.png/.pdf

README assets are stored under docs/ (logo, figures, team photos).


📈 Performance Benchmarks

The pipeline is designed for straightforward batch analysis. Runtime depends on the number of listings and hotspots.

⏱️ Scaling Overview

Stage Core operation Typical scaling
Distance computation Haversine per listing × hotspot ~ proportional to (N listings × M hotspots)
Correlations Pearson correlations on numeric columns ~ proportional to number of rows and variables
Visualization/export Heatmap rendering + file writes ~ proportional to matrix size

🎥 Results & Visualizations

🖼️ Examples Gallery

Correlation matrices generated by the included example run (stored under docs/figures/ for README rendering):

Test 1
Variable: host_is_superhost

Correlation Matrix 1

PDF · PNG
Test 2
Variable: host_total_listings_count

Correlation Matrix 2

PDF · PNG
Test 3
Derived: listing_count per host_id

Correlation Matrix 3

PDF · PNG

🧑‍🔬 Research Team

🌟 Meet the Team

Researchers and students contributing to this project

👥 Main Researchers

Photo Researcher Affiliation Contact
Dr. Gerardo Tinoco-Guerrero Dr. Gerardo Tinoco-Guerrero 🇲🇽
Geospatial Analytics & Computational Methods
Company: SIIIA MATH
University: UMSNH
Contact
ORCID 0000-0003-3119-770X
Dr. José Alberto Guzmán-Torres Dr. José Alberto Guzmán-Torres 🇲🇽
Urban Analytics & Data-Driven Modeling
Company: SIIIA MATH
University: UMSNH
Contact
ORCID 0000-0002-9309-9390
Dr. Narciso Salvador Tinoco-Guerrero Dr. Narciso Salvador Tinoco-Guerrero 🇲🇽
Statistical Modeling & Applied Research
University: UMSNH
University: UVAQ
Contact
ORCID 0000-0003-1209-1184

🤝 Collaborators

Photo Collaborator Affiliation Contact
Dr. Francisco Javier Domínguez Mota Dr. Francisco Javier Domínguez-Mota 🇲🇽
Applied Mathematics & Scientific Computing
University: UMSNH
Collaboration: Aula CIMNE-Morelia
Contact
Dr. Heriberto Árias Rojas Dr. Heriberto Árias-Rojas 🇲🇽
Engineering Applications
University: UMSNH
Collaboration: Aula CIMNE-Morelia
Contact

🎓 Ph.D. Research Students

Photo Student Institution Contact
Gabriela Pedraza-Jiménez Gabriela Pedraza-Jiménez
Ph.D. Research Student
University: UMSNH Contact
Eli Chagolla-Inzunza Eli Chagolla-Inzunza
Ph.D. Research Student
University: UMSNH Contact

🎓 M.Sc. Research Students

Photo Student Institution Contact
Jorge L. González-Figueroa Jorge L. González-Figueroa
M.Sc. Research Student
University: UMSNH Contact
Christopher N. Magaña-Barocio Christopher N. Magaña-Barocio
M.Sc. Research Student
University: UMSNH Contact

🎓 Undergraduate Research Students

Photo Student Institution Contact
Maria Goretti Fraga-Lopez Maria Goretti Fraga-Lopez
Undergraduate Research Student
University: UMSNH Contact

🏭 Industry Partners Supporting Innovation

🌟 Industry Partners Supporting Innovation

Collaboration between academia and industry to accelerate real-world impact

🏭 SIIIA MATH

Soluciones en Ingeniería, Mexico

Website Type Location

🎯 Focus areas:

  • Mathematical modeling & simulation
  • AI/ML engineering solutions
  • Technology transfer and applied R&D

Contact


📝 Citation & License

📄 License

This project is distributed under the MIT License.

🔖 Suggested citation

@software{tinoco2024citc_airbnb,
  title        = {Cultural Proximity Analytics: Distance and correlation analysis (short-term rentals and cultural hotspots)},
  author       = {Tinoco-Guerrero, Gerardo and Guzmán-Torres, José Alberto and Tinoco-Guerrero, Narciso Salvador},
  year         = {2024},
  institution  = {Universidad Michoacana de San Nicolás de Hidalgo},
  note         = {Geographical distance computation (Haversine) and correlation analysis (Pearson) between host variables and proximity to cultural hotspots.}
}

🙏 Acknowledgments

❤️ Special Thanks

We extend our gratitude to the institutions and partners supporting this research

🏛️ Institutional Support

🎓 Universidad Michoacana de San Nicolás de Hidalgo (UMSNH)
Academic institution, Mexico

Website Type: University Support: Infrastructure

Key support
  • Research infrastructure and academic environment
  • Scientific training and supervision
💰 CONAHCyT
National Council of Humanities, Sciences and Technologies, Mexico

Website Type: Government Support: Funding

Key support
  • Research funding and scientific development
  • Support for open research outputs
🌿 Centre Internacional de Mètodes Numèrics en Enginyeria (CIMNE)
Research center, Spain

Website Type: Research Center Support: Collaboration

Key support
  • International collaboration and research environment
  • Academic and applied computing exchange
🏭 SIIIA MATH: Soluciones en Ingeniería
Industry partner, Mexico

Website Type: Industry Partner Support: Technology Transfer

Key support
  • Applied R&D perspective and real-world relevance
  • Industry-academia collaboration

:building_with_garden: Research Centers & Collaborations

🌿 Aula CIMNE-Morelia
Research collaboration space

Website Area: Numerical Methods Collaboration: Applied Computing

Collaboration highlights
  • Numerical methods and computational engineering environment
  • Scientific collaboration and training activities
🎓 UMSNH
Academic collaboration

Website Type: University Support: Research Infrastructure

Collaboration highlights
  • Institutional support for research and training
  • Graduate formation and supervision for scientific computing

📧 Contact & Support

Contact channels, technical support, and collaboration opportunities

Issues Email

Primary Contact
Research coordination

Dr. Gerardo Tinoco-Guerrero
Morelia, Michoacán, Mexico

Email Company: SIIIA MATH University: UMSNH
Technical Support
Bug reports, questions, and collaboration requests

Open an Issue Send Email Request Collaboration

  • Issues for bugs and feature requests
  • Email for technical inquiries
  • Collaboration for partnerships and joint projects
Collaboration Opportunities
Research and engineering partnerships

🗺️ Geospatial Analytics
distance-to-hotspot computation, spatial feature engineering
📍 Urban & Cultural Data
cultural hotspots, accessibility, spatial indicators
📈 Spatial Statistics
correlation analysis, robustness checks, interpretation
🧰 Reproducible Pipelines
open datasets, scripts, and publication-ready outputs
🏙️ Tourism & Mobility
proximity effects, hospitality patterns, urban studies
Student Opportunities
Projects and training in data science and geospatial analytics

  • Graduate Programs: research opportunities with the team
  • Undergraduate Projects: thesis topics in urban and spatial data science
  • Internships: analytics workflows, visualization, and reproducible research
Institutional Affiliations

SIIIA MATH UMSNH UVAQ Research Group

💬 FAQ

Do I need Jupyter?
No. You can run the full pipeline with python Correlations.py. Jupyter is only needed for the notebook.
Why are the example inputs packaged as .tar.gz files?
To keep example datasets compact and easy to distribute. Extract them into Information/ before running the script.
How do I add a new cultural hotspot?
Add a row to Information/places.csv with name, latitude, longitude. The pipeline will generate a new distance column (d_t_...) automatically.

Advancing reproducible geospatial analytics through open-source collaboration

GitHub stars GitHub forks GitHub watchers


If this project helps your research, please consider giving it a star.

About

Reproducible pipeline to compute Haversine distances and Pearson correlations between short-term rentals and cultural hotspots.

Resources

License

Stars

Watchers

Forks

Packages