Timeseries query over UnnestDataSource with non-ALL granularity hangs on wide intervals



### Affected Version

36.0.0

### Description

A timeseries query with a non-ALL granularity (e.g., DAY) over an UnnestDataSource effectively hangs when the query interval is very wide (e.g., ETERNITY). The equivalent groupBy query returns instantly.

This affects SQL queries using FLOOR(__time TO DAY) (or similar time floor expressions like TIME_FLOOR or DATE_TRUNC probably) combined with CROSS JOIN UNNEST, as the SQL planner converts these into timeseries queries with DAY granularity.

An LLM found the following root cause for the issue (this should be verified, I do not have enough knowledge to judge if it's correct):

CursorGranularizer.create() clips the query interval using timeBoundaryInspector when available. For unnest datasources where the inspector is absent or the segment boundaries are wide, the ETERNITY interval
   is not clipped, causing granularity.getIterable() to generate billions of day-sized buckets. The engine then iterates over all of them, even though skipEmptyBuckets is set -- it still calls advanceToBucket() for each one.

The SQL planner's canUseQueryGranularity() check (DruidQuery.java) is meant to prevent this for non-table datasources by rejecting ETERNITY intervals. However, ExecutionVertex.isTableBased() traverses through UnnestDataSource down to the underlying TableDataSource, returning true and bypassing the safety check.

Reproduction

Submit this native query to /druid/v2 -- no datasource creation needed:

```
  {
    "queryType": "timeseries",
    "dataSource": {
      "type": "unnest",
      "base": {
        "type": "inline",
        "columnNames": ["__time", "vals"],
        "columnTypes": ["LONG", "ARRAY<LONG>"],
        "rows": [
          [1704067200000, [10, 20, 30]],
          [1704153600000, [40, 50, 60]]
        ]
      },
      "virtualColumn": {
        "type": "expression",
        "name": "unnested",
        "expression": "\"vals\"",
        "outputType": "ARRAY<LONG>"
      }
    },
    "intervals": "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z",
    "granularity": "DAY",
    "aggregations": [
      { "type": "count", "name": "cnt" }
    ],
    "context": {
      "skipEmptyBuckets": true,
      "timeout": 10000
    }
  }
```

Expected: Returns results (2 day buckets, 3 rows each).
Actual: Times out.

Narrowing intervals to "2024-01-01/2024-01-03" returns instantly. The equivalent groupBy query also returns instantly.

SQL Reproduction

```
  -- Times out: planner chooses timeseries with DAY granularity
  SELECT
    FLOOR(__time TO DAY) AS aggregation,
    AVG(CAST(val AS DOUBLE)) AS avg_val
  FROM "any_datasource_with_array_column"
  CROSS JOIN UNNEST(array_column) AS d (val)
  GROUP BY FLOOR(__time TO DAY)
  ORDER BY aggregation
```

```
  -- Works: planner chooses groupBy
  SELECT
    __time AS aggregation,
    AVG(CAST(val AS DOUBLE)) AS avg_val
  FROM "any_datasource_with_array_column"
  CROSS JOIN UNNEST(array_column) AS d (val)
  GROUP BY __time
  ORDER BY aggregation
```

The only difference is FLOOR(__time TO DAY) vs raw __time. The former triggers a timeseries plan with DAY granularity; the latter triggers a groupBy plan that completes immediately.

The same applies to TIME_FLOOR(__time, 'P1D') and DATE_TRUNC('day', __time) -- any expression the planner recognizes as a time floor on __time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeseries query over UnnestDataSource with non-ALL granularity hangs on wide intervals #19270

Affected Version

Description

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Timeseries query over UnnestDataSource with non-ALL granularity hangs on wide intervals #19270

Description

Affected Version

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions