feat: add disk-utilization-aware segment loading by jtuglu1 · Pull Request #19288 · apache/druid

jtuglu1 · 2026-04-09T23:08:25Z

Description

Problem

We have seen the following issue in our production clusters:

Tier X exhibits a persistent bimodal disk distribution: ~35 servers near 70% full (Group A) and ~35 servers at 99%+ (Group B). Two root causes prevent the coordinator segment balancing from correcting this:

Round-robin initial placement distributes new segments across all servers, continuously loading Group B servers that are already near-full.
CostBalancerStrategy is purely temporal — disk utilization plays no role in selecting a move destination server. This causes two problems:
- Moves from Group B are never scheduled: the balancer adds the source server back to the candidate pool as a "stay in place" option, and the temporal cost function frequently scores it as optimal, marking the segment as "Optimally placed."
- Moves to Group B are attempted and fail at runtime: the coordinator's snapshot of available disk space lags behind the historical's live state, so the balancer schedules a load to a server it believes has capacity, but the historical rejects it because the disk is full.

Together these create a feedback loop: new segments land on Group B, moves to drain Group B are never scheduled, moves to Group B fail, and the imbalance compounds over time (disks in Group B remain pinned at ~100%).

Solution

The core issues we are trying to solve are:

Minimize server disk utilization variance within a tier subject to segment temporal locality.
Prevent cases where we get into a state of severe disk imbalance.
Allow for an "off-ramp" in case we do get into a perpetual state of imbalance.
Create a tunable way to deterministically force data redistribution, while still allowing oversubscription in worst-case scenarios (e.g. auto-scaling is delayed).

Considerations

I considered 2 options:

Add a static/dynamic threshold to candidate server selection (e.g. don't assign to servers with >{threshold}% utilization).
Change CostBalancerStrategy to penalize high disk utilization in the cost function.

The core tradeoff is whether to allow Druid to oversubscribe a disk or not. Option #2 would permit oversubscription based on a heuristic — if the segment fits AND temporal locality gain outweighs the disk penalty — meaning it is less deterministic and has pathological cases where temporal value still outweighs the utilization penalty.

Option #1 is a harder limit that is more deterministic. I opted for a preference-with-fallback approach: prefer servers below a configurable utilization threshold, then fall back to current behavior if all servers exceed the threshold. This keeps temporal locality optimization within the set of "valid" servers, and avoids blocking segment loads entirely during oversubscription events (e.g. slow auto-scaling) where no
server is below the threshold.

Additionally, when selecting a move destination, the source server is only re-added to the candidate pool if it is itself below the threshold. This prevents the balancer from declaring a segment on a 99%-full server as "Optimally placed" and suppressing the drain move.

Configuration

Static default
druid.coordinator.segmentLoading.defaultServerFillThreshold
Default: 1.0 (disabled — preserves current behavior).

Dynamic per-tier overrides

{ "tierServerFillThreshold": { "temp": 0.90 } }

The per-tier override takes precedence over the static default. If no override exists for a tier, the static default applies.

Release note

Add disk-utilization-aware segment loading threshold to help balance segment load evenly

This PR has:

feat: add disk-utilization-aware segment loading

759d120

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add disk-utilization-aware segment loading#19288

feat: add disk-utilization-aware segment loading#19288
jtuglu1 wants to merge 1 commit intoapache:masterfrom
jtuglu1:disk-util-aware-segment-movement

jtuglu1 commented Apr 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jtuglu1 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Considerations

Configuration

Release note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jtuglu1 commented Apr 9, 2026 •

edited

Loading