Description
Currently, Druid performs full table scans on every ingestion run. For large Iceberg tables (billions of rows), this makes real-time ingestion impractical:
- Re-scanning 50TB table every minute is impossible
- Compute costs are prohibitive
- SLA requirements (sub-minute latency) cannot be met
Motivation
A financial trading platform needs to ingest new stock trades within 30 seconds of arrival. Their Iceberg table has 50 billion historical rows.
Current Behavior:
ingestion:
type: iceberg
table: stock_trades
schedule: "@every 1m"
Result:
- Every 1 minute: Full table scan of 50 billion rows
- Takes 45 minutes (FAILS SLA)
- Cost: High
Description
Currently, Druid performs full table scans on every ingestion run. For large Iceberg tables (billions of rows), this makes real-time ingestion impractical:
Motivation
A financial trading platform needs to ingest new stock trades within 30 seconds of arrival. Their Iceberg table has 50 billion historical rows.
Current Behavior:
ingestion:
type: iceberg
table: stock_trades
schedule: "@every 1m"
Result:
- Every 1 minute: Full table scan of 50 billion rows
- Takes 45 minutes (FAILS SLA)
- Cost: High