Skip to content

Feature Request: Geo-Aware Bucket Routing #11

@diegoripley

Description

@diegoripley

Summary

Add support for routing read requests to different backend storage providers based on the client's geographic location or credentials. This would allow a single virtual bucket (e.g., statistical) to be served from the nearest replica — reducing latency and improving download speeds for users worldwide.

Motivation

For a data portal like dataforcanada.org, datasets are accessed globally. Currently, each virtual bucket maps to exactly one backend. A client in Tokyo downloading from a Source Cooperative bucket in us-west-2 experiences high latency. With geo-routing, that same request would be served from a Tigris bucket in Tokyo or an R2 bucket in Asia-Pacific.

Proposed Behavior

A new configuration section would map (bucket, country/region) pairs to alternative backend configs. When a read request arrives, the proxy checks the client's location (available via Cloudflare's cf.country on Workers) and routes to the nearest replica. Writes would continue to go to the primary backend.

flowchart TD
    JP["Client in Japan 🇯🇵"] -->|"GET /statistical/data.parquet"| Worker["CF Worker - s3.dataforcanada.org"]
    CA["Client in Canada 🇨🇦"] -->|"GET /statistical/data.parquet"| Worker
    DE["Client in Germany 🇩🇪"] -->|"GET /statistical/data.parquet"| Worker
    AU["Client in Australia 🇦🇺"] -->|"GET /statistical/data.parquet"| Worker
    PL["Client in Poland 🇵🇱"] -->|"GET /statistical/data.parquet"| Worker

    Worker --> GeoMW{"Geo-Router - Middleware"}

    GeoMW -->|"🇯🇵 JP → Tigris Tokyo (nrt)"| Tigris["Tigris Data - Tokyo"]
    GeoMW -->|"🇨🇦 CA → Primary"| SC["Source Cooperative - us-west-2"]
    GeoMW -->|"🇩🇪 DE → R2 Western Europe (weur)"| R2WEUR["Cloudflare R2 - Western Europe"]
    GeoMW -->|"🇵🇱 PL → R2 Eastern Europe (eeur)"| R2EEUR["Cloudflare R2 - Eastern Europe"]
    GeoMW -->|"🇦🇺 AU → R2 Oceania (oc)"| R2OC["Cloudflare R2 - Oceania"]
Loading

Example Configuration

# Primary bucket (default — North America)
[[buckets]]
name = "statistical"
backend_type = "s3"
anonymous_access = true
backend_prefix = "dataforcanada/d4c-datapkg-statistical/"

[buckets.backend_options]
bucket_name = "us-west-2.opendata.source.coop"
endpoint = "https://s3.us-west-2.amazonaws.com"
region = "us-west-2"
skip_signature = "true"

# Geo overrides for the "statistical" bucket
[geo_routing.statistical]

# Japan → Tigris Tokyo
[geo_routing.statistical.JP]
backend_type = "s3"
endpoint = "https://t3.storage.dev"
bucket_name = "d4c-datapkg-statistical"
region = "nrt"

# Asia-Pacific (fallback for other APAC countries) → R2 Asia-Pacific
[geo_routing.statistical.apac]
backend_type = "s3"
endpoint = "https://<account>.r2.cloudflarestorage.com"
bucket_name = "d4c-datapkg-statistical-apac"
region = "apac"

# Western Europe → R2 Western Europe
[geo_routing.statistical.weur]
backend_type = "s3"
endpoint = "https://<account>.r2.cloudflarestorage.com"
bucket_name = "d4c-datapkg-statistical-weur"
region = "weur"

# Eastern Europe → R2 Eastern Europe
[geo_routing.statistical.eeur]
backend_type = "s3"
endpoint = "https://<account>.r2.cloudflarestorage.com"
bucket_name = "d4c-datapkg-statistical-eeur"
region = "eeur"

# Oceania → R2 Oceania
[geo_routing.statistical.oc]
backend_type = "s3"
endpoint = "https://<account>.r2.cloudflarestorage.com"
bucket_name = "d4c-datapkg-statistical-oc"
region = "oc"

Resolution Priority

When a request arrives from a specific country, the geo-router should resolve in this order:

  1. Exact country match — e.g., JP → Tigris Tokyo
  2. Region match — e.g., other APAC countries → R2 Asia-Pacific
  3. Primary backend — fallback to the default bucket config (Source Cooperative)

Scope

  • Reads only (GET, HEAD, LIST) — writes should always go to the primary backend
  • Cloudflare Workers — the cf.country field is readily available on every request
  • Server runtime — could use GeoIP lookup on source_ip as a future extension

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions