feat(table): add bin-pack compaction strategy#850
Open
laskoviymishka wants to merge 4 commits intoapache:mainfrom
Open
feat(table): add bin-pack compaction strategy#850laskoviymishka wants to merge 4 commits intoapache:mainfrom
laskoviymishka wants to merge 4 commits intoapache:mainfrom
Conversation
442605a to
2eb4e80
Compare
zeroshade
requested changes
Apr 7, 2026
table/compaction_strategy.go
Outdated
Comment on lines
+28
to
+29
| // CompactionConfig holds tunable thresholds for bin-pack compaction. | ||
| type CompactionConfig struct { |
Member
There was a problem hiding this comment.
I wonder if compaction should actually be a subpackage inside of table or something? The table package is getting quite large and I'd like to either refactor/reduce it or at least avoid putting even more things into it if possible.
What do you think?
Contributor
Author
There was a problem hiding this comment.
totally makes sense
2eb4e80 to
0d46bc1
Compare
zeroshade
approved these changes
Apr 7, 2026
Member
zeroshade
left a comment
There was a problem hiding this comment.
Looks good to me, just two questions. One below and the other as to whether we should update PackEnd to not be a footgun modifying the caller's slice (if we decide that, it can be updated in a follow up rather than here).
zeroshade
reviewed
Apr 8, 2026
zeroshade
reviewed
Apr 8, 2026
zeroshade
reviewed
Apr 8, 2026
Add table/compaction package with Config and PlanCompaction() that groups FileScanTasks by partition, classifies files as candidates based on size thresholds and delete file counts, and bin-packs candidates into Groups using the existing SlicePacker. - Oversized files skipped unless delete count exceeds threshold - Config validation (target between min/max, positive thresholds) - Ceiling division for output file estimation - EstOutputBytes is an upper-bound (actual is smaller after delete removal and better Parquet compression on larger files) - Returns Plan by value to avoid unnecessary heap allocation - Uses map[string]partitionBucket (value type, not pointer)
19cbb42 to
6c122ce
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add CompactionConfig and Plan() that groups FileScanTasks by partition, classifies files as candidates based on size thresholds and delete file counts, and bin-packs candidates into CompactionGroups using the existing SlicePacker. This is the planning layer for RewriteDataFiles (#832).