Currently, the Druid Iceberg extension reads ALL columns from Iceberg data files regardless of which columns are needed for ingestion. For tables with hundreds of columns, this causes:
- 10-100x unnecessary data read from storage
- Increased memory pressure during ingestion
- Slower query performance
- Higher cloud storage egress costs
An e-commerce analytics team has an Iceberg table with 150 columns but only needs 5 columns (timestamp, product_id, category, price, quantity) for their Druid dashboard. Currently, Druid reads all 150 columns, causing:
- Query time:
- Memory:
- Data transfer:
Currently, the Druid Iceberg extension reads ALL columns from Iceberg data files regardless of which columns are needed for ingestion. For tables with hundreds of columns, this causes:
An e-commerce analytics team has an Iceberg table with 150 columns but only needs 5 columns (timestamp, product_id, category, price, quantity) for their Druid dashboard. Currently, Druid reads all 150 columns, causing: