-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Summary
Add support for routing read requests to different backend storage providers based on the client's geographic location or credentials. This would allow a single virtual bucket (e.g., statistical) to be served from the nearest replica — reducing latency and improving download speeds for users worldwide.
Motivation
For a data portal like dataforcanada.org, datasets are accessed globally. Currently, each virtual bucket maps to exactly one backend. A client in Tokyo downloading from a Source Cooperative bucket in us-west-2 experiences high latency. With geo-routing, that same request would be served from a Tigris bucket in Tokyo or an R2 bucket in Asia-Pacific.
Proposed Behavior
A new configuration section would map (bucket, country/region) pairs to alternative backend configs. When a read request arrives, the proxy checks the client's location (available via Cloudflare's cf.country on Workers) and routes to the nearest replica. Writes would continue to go to the primary backend.
flowchart TD
JP["Client in Japan 🇯🇵"] -->|"GET /statistical/data.parquet"| Worker["CF Worker - s3.dataforcanada.org"]
CA["Client in Canada 🇨🇦"] -->|"GET /statistical/data.parquet"| Worker
DE["Client in Germany 🇩🇪"] -->|"GET /statistical/data.parquet"| Worker
AU["Client in Australia 🇦🇺"] -->|"GET /statistical/data.parquet"| Worker
PL["Client in Poland 🇵🇱"] -->|"GET /statistical/data.parquet"| Worker
Worker --> GeoMW{"Geo-Router - Middleware"}
GeoMW -->|"🇯🇵 JP → Tigris Tokyo (nrt)"| Tigris["Tigris Data - Tokyo"]
GeoMW -->|"🇨🇦 CA → Primary"| SC["Source Cooperative - us-west-2"]
GeoMW -->|"🇩🇪 DE → R2 Western Europe (weur)"| R2WEUR["Cloudflare R2 - Western Europe"]
GeoMW -->|"🇵🇱 PL → R2 Eastern Europe (eeur)"| R2EEUR["Cloudflare R2 - Eastern Europe"]
GeoMW -->|"🇦🇺 AU → R2 Oceania (oc)"| R2OC["Cloudflare R2 - Oceania"]
Example Configuration
# Primary bucket (default — North America)
[[buckets]]
name = "statistical"
backend_type = "s3"
anonymous_access = true
backend_prefix = "dataforcanada/d4c-datapkg-statistical/"
[buckets.backend_options]
bucket_name = "us-west-2.opendata.source.coop"
endpoint = "https://s3.us-west-2.amazonaws.com"
region = "us-west-2"
skip_signature = "true"
# Geo overrides for the "statistical" bucket
[geo_routing.statistical]
# Japan → Tigris Tokyo
[geo_routing.statistical.JP]
backend_type = "s3"
endpoint = "https://t3.storage.dev"
bucket_name = "d4c-datapkg-statistical"
region = "nrt"
# Asia-Pacific (fallback for other APAC countries) → R2 Asia-Pacific
[geo_routing.statistical.apac]
backend_type = "s3"
endpoint = "https://<account>.r2.cloudflarestorage.com"
bucket_name = "d4c-datapkg-statistical-apac"
region = "apac"
# Western Europe → R2 Western Europe
[geo_routing.statistical.weur]
backend_type = "s3"
endpoint = "https://<account>.r2.cloudflarestorage.com"
bucket_name = "d4c-datapkg-statistical-weur"
region = "weur"
# Eastern Europe → R2 Eastern Europe
[geo_routing.statistical.eeur]
backend_type = "s3"
endpoint = "https://<account>.r2.cloudflarestorage.com"
bucket_name = "d4c-datapkg-statistical-eeur"
region = "eeur"
# Oceania → R2 Oceania
[geo_routing.statistical.oc]
backend_type = "s3"
endpoint = "https://<account>.r2.cloudflarestorage.com"
bucket_name = "d4c-datapkg-statistical-oc"
region = "oc"Resolution Priority
When a request arrives from a specific country, the geo-router should resolve in this order:
- Exact country match — e.g.,
JP→ Tigris Tokyo - Region match — e.g., other APAC countries → R2 Asia-Pacific
- Primary backend — fallback to the default bucket config (Source Cooperative)
Scope
- Reads only (GET, HEAD, LIST) — writes should always go to the primary backend
- Cloudflare Workers — the
cf.countryfield is readily available on every request - Server runtime — could use GeoIP lookup on
source_ipas a future extension