docs: Recommend Overlord-based auto-compaction and mark useIncrementalCache production ready#19252
docs: Recommend Overlord-based auto-compaction and mark useIncrementalCache production ready#19252cecemei wants to merge 11 commits intoapache:masterfrom
Conversation
|
The default for useSupervisors should be true in the cluster compaction config if we are recommending it going forward. that way all new deploys will get the recommended config |
updated default to true, PTAL! |
| |`druid.manager.rules.pollDuration`|The duration between polls the Coordinator does for updates to the set of active rules. Generally defines the amount of lag time it can take for the Coordinator to notice rules.|`PT1M`| | ||
| |`druid.manager.rules.defaultRule`|The default rule for the cluster|`_default`| | ||
| |`druid.manager.rules.alertThreshold`|The duration after a failed poll upon which an alert should be emitted.|`PT10M`| | ||
| |Property|Description| Default | |
There was a problem hiding this comment.
what's up with all the unrelated formatting changes?
| */ | ||
| @ThreadSafe | ||
| public class CoordinatorRunStats | ||
| public class DruidRunStats |
There was a problem hiding this comment.
I can't help but wonder if there is a better name for this, maybe DutyRunStats or something that means 'thing to collect stuff to emit later from regularly occurring internal chores'? I guess 'duty' isn't quite right because that is basically only used to refer to periodic coordinator tasks, not supervisor stuff. The javadoc still only mentions coordinator run/duties, which should be fixed.
Also, it is kind of weird having the name DruidRunStats but the thing it has is still called CoordinatorStat, it seems like that naming should be changed to reflect the change here.
Stepping back, what exactly is the motivation for renaming, i guess that compaction uses and runs as a supervisor now so it isn't really specific to the coordinator? While this is still used quite heavily by all of the things the coordinator does, it seems reasonable to give it a more generic name of some sort, I was just wondering if this one is a bit too generic, but maybe is fine too as long as the javadoc clarifies its purpose?
Some addtional thoughts: I believe some of us would like to eventually merge the coordinator and overlord into a single service. Since they both now basically need a heavy segment timeline and so have similar footprint requirements, there aren't a lot of compelling reasons to keep them separate anymore. In my mind 'coordinator' would be the remaining service, with all of the overlords functionality merged into it (though this hasn't really been discussed so maybe other people have other opinions), so if that were true then this would basically become something only used by the coordinator again heh. There is a lot of work to do for something like this, so it is not really a short term goal afaik and needs more official discussion at some point, just adding it here for additional stuff to think about.
| * Can use either the native compaction engine or the [MSQ task engine](#use-msq-for-auto-compaction) | ||
| * More reactive and submits tasks as soon as a compaction slot is available | ||
| * Tracked compaction task status to avoid re-compacting an interval repeatedly | ||
| * Uses new Indexing State Fingerprinting mechanisms to store less data per segment in metadata storage |
There was a problem hiding this comment.
i know this isn't new, but by default we still store compaction state afaict (ClusterCompactionConfig.storeCompactionStatePerSegment defaults to true), so this should be reworded to be like 'can be configured to store only fingerprints' or whatever
| - **MSQ compaction engine**: Set `engine` to `msq` in the compaction dynamic config or in the supervisor spec. | ||
| - **Incremental segment metadata caching**: Set `druid.manager.segments.useIncrementalCache` to `always` or `ifSynced` in your Overlord and Coordinator runtime properties. See [Segment metadata caching](../configuration/index.md#metadata-retrieval). | ||
| - **At least two compaction task slots**: The MSQ task engine requires at least two tasks (one controller, one worker). | ||
|
|
There was a problem hiding this comment.
i think we need to leave part of this, no? the default engine is still native, so people still need to set engine to msq, and you need 2 compaction slots since its using msq engine
Description
CoordinatorRunStatstoDruidRunStatsRelease note
Automatic Compaction
Incremental cache
useIncrementalCache) is no longer experimental and defaults toifSynced.This PR has: