Skip to content

[Druid Iceberg Extension] Implement incremental/snapshot-based ingestion to support real-time use cases #19268

@Shekharrajak

Description

@Shekharrajak

Description

Currently, Druid performs full table scans on every ingestion run. For large Iceberg tables (billions of rows), this makes real-time ingestion impractical:

  • Re-scanning 50TB table every minute is impossible
  • Compute costs are prohibitive
  • SLA requirements (sub-minute latency) cannot be met

Motivation

A financial trading platform needs to ingest new stock trades within 30 seconds of arrival. Their Iceberg table has 50 billion historical rows.
Current Behavior:
ingestion:
type: iceberg
table: stock_trades
schedule: "@every 1m"

Result:

- Every 1 minute: Full table scan of 50 billion rows

- Takes 45 minutes (FAILS SLA)

- Cost: High

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions