fix: include ssdproxy unaccounted DPDK memory on ssd_proxy_includes_dpdk_memory feature flag (OP-272)#2405
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Snapshot WarningsEnsure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice. Scanned FilesNone |
How to use the Graphite Merge QueueAdd the label main-merge-queue to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has required the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
Graphite Automations"Add anton/matt/sergey/kristina as reviwers on operator PRs" took an action on this PR • (03/27/26)2 reviewers were added to this PR based on Anton Bykov's automation. |
There was a problem hiding this comment.
Pull request overview
This PR adds a new image feature flag to control how SSDProxy hugepages memory is translated into the --memory/MEMORY value, so that (when enabled) SSDProxy includes previously unaccounted DPDK memory.
Changes:
- Add
ssd_proxy_includes_dpdk_memoryto the image feature flags model. - Plumb feature flags into
resources.PodFactoryandensurePod, and use them when computing hugepages memory details. - Update
NewPodFactorycall sites in tests to match the new signature.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/pkg/domain/feature_flags.go | Adds the new feature flag field to the serialized flags struct. |
| internal/controllers/wekacontainer/funcs_pod_ensure.go | Fetches feature flags (except for the feature-flags ad-hoc mode) and passes them into PodFactory. |
| internal/controllers/resources/pod.go | Extends PodFactory to accept optional feature flags and adjusts hugepages memory calculation accordingly. |
| internal/controllers/wekacontainer/funcs_pod_ensure_test.go | Updates tests for the new NewPodFactory signature. |
| internal/controllers/operations/load_drivers_test.go | Updates tests for the new NewPodFactory signature. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| var ff *domain.FeatureFlags | ||
| if container.Spec.Mode != weka.WekaContainerModeAdhocOpWC { | ||
| // Ad-hoc container is the mechanism for fetching flags — skip for it | ||
| ff, err = r.GetFeatureFlags(ctx) | ||
| if err != nil { | ||
| return errors.Wrap(err, "failed to get feature flags") | ||
| } | ||
| } |
There was a problem hiding this comment.
Feature flags are fetched for every container mode except AdhocOpWC, but they are currently only used to influence hugepages/--memory behavior. This introduces extra reconciliation work and can block pod creation while an ad-hoc feature-flags fetch runs. Consider fetching/passing feature flags only when needed (e.g., SSDProxy mode with 2Mi hugepages), otherwise pass nil to preserve the existing fast path.
There was a problem hiding this comment.
feature flags are cached so that's not an issue
939c9c8 to
d562610
Compare
|
Claude finished @kristina-solovyova's task in 2m 9s —— View job Code Review
The implementation is clean and correct. A few notes below. Correctness of the Copilot commentsIndex numbering comment ( Mode check ( if ff != nil && ff.SsdProxyIncludesDpdkMemory && container.Spec.Mode == weka.WekaContainerModeSSDProxy {This comment is stale/wrong. Feature flag fetching for all modes ( if container.Spec.Mode == weka.WekaContainerModeSSDProxy {
ff, err = r.GetFeatureFlags(ctx)
...
}All three Copilot comments are addressed or incorrect. Issues Found1. Inconsistent comment wording between Go and Python (minor)
// flag 8 is not yet added
# flag 8 is not used by the operator"Not yet added" implies a future addition, while "not used by the operator" is a factual statement about the current state. The Python wording is more accurate since flag 8 exists in the spec but the operator simply doesn't need it. Suggest aligning the Go comment to match: 2. No unit tests for the new memory calculation path (moderate) There are no test files in What's Good
Summary: Two concerns — a trivial comment wording inconsistency and the absence of unit tests for the new memory calculation logic. The core implementation is correct and the Copilot review comments are either already addressed or wrong. |
d562610 to
c186afd
Compare
|
Claude finished @kristina-solovyova's task in 39s —— View job No code changes since the previous review on 2026-03-27. The branch tip is still Previous review findings remain open:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Hugepages: hugepagesMiB + config.Consts.SsdProxyDpdkMemoryMiB + config.Config.DriveSharing.SsdProxyHugepagesOffsetMiB, | ||
| HugepagesOffset: config.Config.DriveSharing.SsdProxyHugepagesOffsetMiB, | ||
| HugepagesSize: "2Mi", |
There was a problem hiding this comment.
buildProxyContainerSpec now sets Spec.Hugepages to include SsdProxyDpdkMemoryMiB and also introduces Spec.HugepagesOffset, but ensureProxyContainer() returns early when a proxy CR already exists. That means existing SSD proxy containers created by older operator versions (which likely lack HugepagesOffset and didn’t include the DPDK addend) won’t be updated, and the new GetHugePagesDetails() offset logic can compute an incorrect --memory value for those pods after upgrade. Consider patching the existing proxy container spec (Hugepages/HugepagesOffset) when it differs from the desired values, or deleting/recreating it safely.
| Hugepages: hugepagesMiB + config.Consts.SsdProxyDpdkMemoryMiB + config.Config.DriveSharing.SsdProxyHugepagesOffsetMiB, | |
| HugepagesOffset: config.Config.DriveSharing.SsdProxyHugepagesOffsetMiB, | |
| HugepagesSize: "2Mi", | |
| Hugepages: hugepagesMiB + config.Consts.SsdProxyDpdkMemoryMiB, | |
| HugepagesOffset: config.Config.DriveSharing.SsdProxyHugepagesOffsetMiB, | |
| HugepagesSize: "2Mi", |
| } | ||
| case weka.WekaContainerModeSSDProxy: | ||
| offset = config.Config.DriveSharing.SsdProxyHugepagesOffsetMiB | ||
| offset = config.Consts.SsdProxyHugepagesOffsetMiB |
There was a problem hiding this comment.
For SSD proxy mode, the default offset now comes from config.Consts.SsdProxyHugepagesOffsetMiB instead of the configured Config.DriveSharing.SsdProxyHugepagesOffsetMiB. This prevents the SSD_PROXY_HUGEPAGES_OFFSET_MIB env/config from affecting SSD proxy containers that have Spec.HugepagesOffset == 0 (including legacy proxy CRs created before this PR). That can change the computed --memory behavior after upgrade. Consider using config.Config.DriveSharing.SsdProxyHugepagesOffsetMiB here (and only falling back to Consts as its default), or otherwise explicitly handling legacy SSD proxy containers.
| offset = config.Consts.SsdProxyHugepagesOffsetMiB | |
| // Prefer configured SSD proxy hugepages offset, fall back to const default | |
| if cfgOffset := config.Config.DriveSharing.SsdProxyHugepagesOffsetMiB; cfgOffset > 0 { | |
| offset = cfgOffset | |
| } else { | |
| offset = config.Consts.SsdProxyHugepagesOffsetMiB | |
| } |
| nodeInfo *discovery.DiscoveryNodeInfo | ||
| container *weka.WekaContainer | ||
| nodeInfo *discovery.DiscoveryNodeInfo | ||
| featureFlags *domain.FeatureFlags // nil → old behavior (e.g. ad-hoc container) |
There was a problem hiding this comment.
The field comment says featureFlags *domain.FeatureFlags // nil → old behavior (e.g. ad-hoc container), but for SSD proxy mode ff == nil is treated as SsdProxyIncludesDpdkMemory == false (and adds SsdProxyDpdkMemoryMiB to the offset), which is not necessarily the pre-change behavior for legacy SSD proxy specs. Suggest clarifying this comment to reflect the actual semantics (e.g., “nil means assume FF disabled / backward-compatible default”).
| featureFlags *domain.FeatureFlags // nil → old behavior (e.g. ad-hoc container) | |
| featureFlags *domain.FeatureFlags // nil → assume feature flags disabled / backward-compatible defaults (e.g. ad-hoc container) |
There was a problem hiding this comment.
all fine, as "nil means assume FF disabled"
| // GetHugePagesDetails returns hugepages details for a container based on its spec. | ||
| func GetHugePagesDetails(container *weka.WekaContainer) HugePagesDetails { | ||
| // ff is optional; for SSD proxy containers it controls DPDK memory accounting: | ||
| // - without SsdProxyIncludesDpdkMemory: DPDK memory (ConstSsdProxyDpdkMemoryMiB) is excluded |
There was a problem hiding this comment.
Doc comment refers to ConstSsdProxyDpdkMemoryMiB, but the code uses config.Consts.SsdProxyDpdkMemoryMiB. This mismatch makes it harder to grep/maintain the code; please align the comment with the actual constant/field name.
| // - without SsdProxyIncludesDpdkMemory: DPDK memory (ConstSsdProxyDpdkMemoryMiB) is excluded | |
| // - without SsdProxyIncludesDpdkMemory: DPDK memory (config.Consts.SsdProxyDpdkMemoryMiB) is excluded |

This change introduces a new feature flag
SsdProxyIncludesDpdkMemorythat modifies how hugepages memory is calculated for containers. When enabled, the full hugepages amount is passed to the--memoryparameter instead of subtracting the DPDK offset, as the SSD proxy now accounts for DPDK memory internally.The
PodFactoryconstructor now accepts an optionalFeatureFlagsparameter, which is passed through toGetHugePagesDetails()to determine the appropriate memory calculation behavior. For ad-hoc containers used to fetch feature flags,nilis passed to maintain backward compatibility.