Skip to content

[Druid Iceberg Extension] Add column projection support to reduce I/O and improve query performance #19267

@Shekharrajak

Description

@Shekharrajak

Currently, the Druid Iceberg extension reads ALL columns from Iceberg data files regardless of which columns are needed for ingestion. For tables with hundreds of columns, this causes:

  • 10-100x unnecessary data read from storage
  • Increased memory pressure during ingestion
  • Slower query performance
  • Higher cloud storage egress costs

An e-commerce analytics team has an Iceberg table with 150 columns but only needs 5 columns (timestamp, product_id, category, price, quantity) for their Druid dashboard. Currently, Druid reads all 150 columns, causing:

  • Query time:
  • Memory:
  • Data transfer:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions