geoluck

How much of relative country outcomes can be predicted from geography, natural endowments, resource development, social structure, and governance — and who outperforms those expectations?

Geoluck is an open-source research project that builds a country-decade panel (1900–2020) and trains machine learning models to predict country-level income, wellbeing, inequality, wealth, and gender outcomes from tiered feature sets. The results are published as an interactive static site.

This is explicitly about predictive association, not causal effect.

View the live site →

What the site shows

The static site models seven outcome metrics, each converted to within-decade percentile ranks:

Outcome	Definition	Source
Income	Log GDP per capita rank	Maddison Project Database 2023
Wealth	Produced capital per capita rank	World Bank Changing Wealth of Nations
Life expectancy	Life expectancy at birth rank	World Bank WDI / UN Population Division
Inequality	Disposable-income Gini rank (higher = more unequal)	SWIID
Gender inequality	UNDP Gender Inequality Index rank (higher = more unequal)	UNDP HDR 2025
Female LFPR	Female labor-force participation rate rank	World Bank WDI / ILO
Women, Business and the Law	Women, Business and the Law index rank	World Bank

Predictor features are organized into four independently toggleable tiers:

Nature — Pure geography: latitude, climate normals, terrain, soil, malaria ecology, seismic activity, wind/solar potential, ocean productivity, cyclone exposure.
Infrastructure — Resource development: dams, irrigation, oil/gas/coal/mineral extraction, agricultural land use, energy assets.
Society — Social and historical structure: trade openness, colonial history, legal origins, ethnic/religious fractionalization, gender inequality, demographics.
Governance — State capacity, political order, democracy, fragility, and conflict: WGI, V-Dem, Freedom House, FSI, Polity5, UCDP.

All 15 non-empty tier combinations are modeled independently for each outcome (105 model bundles across the seven maintained targets). The site supports interactive choropleth maps, model comparison, country-level SHAP feature contributions, country-vs-country comparison, feature exploration by data source, full sortable rankings with CSV export, staged loading on slower connections, and shareable deep links.

Repository structure

src/           Python pipeline — ETL, feature building, modeling, export
web/           Static frontend — TypeScript, Vite, Leaflet, Chart.js
docs/          Methodology and payload documentation
web/public/data/   Precomputed JSON payloads consumed by the frontend

Data policy

Raw and intermediate research data are not stored in the public repository. Only compact, precomputed JSON payloads required by the static site are committed under web/public/data/. These are generated by the Python pipeline's export commands.

Bundle analytics now use one published display spec per outcome+tier pair: the best-scoring non-baseline spec that also has the full exported analytics bundle needed by the site. Scores, feature effects, permutation importance, and country contributions always stay aligned to that same display spec, and export fails if any maintained bundle cannot meet that contract.

Modeling notes

Models are evaluated out of sample using cross-validated R², RMSE, MAE, and Spearman rank correlation.
User-facing predictions and residuals use cross-validated exports, not in-sample fits.
Feature contributions use SHAP values from fold-trained estimators.
Public bundle analytics ship provenance and availability metadata per tier, including spec_name, model_family, row_count, and whether feature effects, permutation importance, or country contributions are available for the published display spec.
Exported robustness diagnostics currently focus on the flagship income model and include both within-decade holdouts and leave-region-out checks.
Results should be interpreted as predictive structure, not causal effects. A high R² for Nature-only features means geography is a strong statistical predictor — likely because it correlates with deeper causal channels — not that geography causes prosperity.

GitHub Pages deployment

The site is deployed through GitHub Actions, not "Deploy from a branch."

In repository Settings → Pages, set the source to GitHub Actions. The workflow builds the frontend from web/ and publishes the contents of web/dist/.

Local development

# Python pipeline
make sync        # Install/sync Python dependencies
make test        # Run tests

# Frontend
make web-build   # Build the static site (output: web/dist/)
make web-preview-local  # Build from a clean local temp copy and serve a stable preview on localhost

The frontend expects JSON data under web/public/data/. These payloads are committed to the repository and are generated by:

uv run geoluck export-web-data

make test includes a committed-payload contract check so schema drift or cross-file spec mismatches fail in CI before deployment. make web-build now also verifies that web/dist/data and web/public/data share the same export manifest and display-spec selection.

For frontend development with hot reload:

cd web && npm run dev

The exporter writes data_manifest.json alongside metadata.json in web/public/data/. This manifest records the shared export id, payload version, and per-file hashes for the published data bundle.

Documentation

DATA_SOURCES.md — Source registry and licensing notes
docs/MODEL_SPECS.md — Model families, feature-set variants, evaluation design
docs/UI_DATA_PAYLOADS.md — Frontend JSON payload schemas

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
data_final		data_final
data_intermediate		data_intermediate
data_raw		data_raw
docs		docs
logo		logo
notebooks		notebooks
scripts		scripts
src/geoluck		src/geoluck
tests		tests
web		web
.env.example		.env.example
.gitignore		.gitignore
DATA_SOURCES.md		DATA_SOURCES.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
project_env_lib.sh		project_env_lib.sh
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

geoluck

What the site shows

Repository structure

Data policy

Modeling notes

GitHub Pages deployment

Local development

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

geoluck

What the site shows

Repository structure

Data policy

Modeling notes

GitHub Pages deployment

Local development

Documentation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages