Open
Conversation
Member
davidmrdavid
left a comment
There was a problem hiding this comment.
I'm almost ready to approve, just one question
Comment on lines
-576
to
-581
| // if we are processing events that count as activity, our latency category is at least "low" | ||
| if (markPartitionAsActive) | ||
| { | ||
| this.loadInfo.MarkActive(); | ||
| } | ||
|
|
Member
There was a problem hiding this comment.
after this deleted piece of code, this method may return if it is shutting down. Could we be losing information by no longer recording the latency right before a shut down?
Member
Author
There was a problem hiding this comment.
I think that would be o.k. since once we shut down we are no longer reporting the partition load anyway. The partition will be started somewhere else and start reporting load from there.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR addresses several issues and makes some fixes related to the following issues we observed:
Issue 1. The partition load table does not show activity that originates from compactions or checkpoints being performed. This is not ideal because it can mean that (a) issues such as infinite checkpointing or compaction loops easily are not immediately visible (e.g. see #409), and (b) the scale controller is not aware of the true amount of work being performed by partitions.
To fix this we add activity level (L) to the partition load monitor whenever a checkpoint or compaction is in progress. Also, if compactions take particularly long, we add (M) or (H) indicators.
Issue 2. Compactions were only triggered during idle periods, and only after some time delay. This is a problem if there are no idle periods, or if compactions need to run more frequently, such as when we have a continuous influx of requests.
To fix this we check whether a compaction should be performed independently of whether the partition is idle, and how much time has passed.
Issue 3. the new, more efficient compaction algorithm inhttps://github.com//pull/408 warrants different parameter tuning - compaction is now much less expensive compared to the checkpointing that is triggered along with it, so we should do fewer and larger compactions
We adjust the values for the max compaction area size (50000 -> 200000) and the minimum expected reduction at which we start compacting (5000 -> 10000)