Skip to content

fix: include ssdproxy unaccounted DPDK memory on ssd_proxy_includes_dpdk_memory feature flag (OP-272)#2405

Open
kristina-solovyova wants to merge 1 commit intomainfrom
03-27-fix_include_ssdproxy_unaccounted_dpdk_memory_on_ssd_proxy_includes_dpdk_memory_feature_flag_op-272_
Open

fix: include ssdproxy unaccounted DPDK memory on ssd_proxy_includes_dpdk_memory feature flag (OP-272)#2405
kristina-solovyova wants to merge 1 commit intomainfrom
03-27-fix_include_ssdproxy_unaccounted_dpdk_memory_on_ssd_proxy_includes_dpdk_memory_feature_flag_op-272_

Conversation

@kristina-solovyova
Copy link
Copy Markdown
Collaborator

@kristina-solovyova kristina-solovyova commented Mar 27, 2026

This change introduces a new feature flag SsdProxyIncludesDpdkMemory that modifies how hugepages memory is calculated for containers. When enabled, the full hugepages amount is passed to the --memory parameter instead of subtracting the DPDK offset, as the SSD proxy now accounts for DPDK memory internally.

The PodFactory constructor now accepts an optional FeatureFlags parameter, which is passed through to GetHugePagesDetails() to determine the appropriate memory calculation behavior. For ad-hoc containers used to fetch feature flags, nil is passed to maintain backward compatibility.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 27, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA c186afd.
Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

@kristina-solovyova kristina-solovyova marked this pull request as ready for review March 27, 2026 12:17
@kristina-solovyova kristina-solovyova requested a review from a team as a code owner March 27, 2026 12:17
Copilot AI review requested due to automatic review settings March 27, 2026 12:17
Copy link
Copy Markdown
Collaborator Author


How to use the Graphite Merge Queue

Add the label main-merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has required the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@graphite-app
Copy link
Copy Markdown

graphite-app bot commented Mar 27, 2026

Graphite Automations

"Add anton/matt/sergey/kristina as reviwers on operator PRs" took an action on this PR • (03/27/26)

2 reviewers were added to this PR based on Anton Bykov's automation.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new image feature flag to control how SSDProxy hugepages memory is translated into the --memory/MEMORY value, so that (when enabled) SSDProxy includes previously unaccounted DPDK memory.

Changes:

  • Add ssd_proxy_includes_dpdk_memory to the image feature flags model.
  • Plumb feature flags into resources.PodFactory and ensurePod, and use them when computing hugepages memory details.
  • Update NewPodFactory call sites in tests to match the new signature.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
internal/pkg/domain/feature_flags.go Adds the new feature flag field to the serialized flags struct.
internal/controllers/wekacontainer/funcs_pod_ensure.go Fetches feature flags (except for the feature-flags ad-hoc mode) and passes them into PodFactory.
internal/controllers/resources/pod.go Extends PodFactory to accept optional feature flags and adjusts hugepages memory calculation accordingly.
internal/controllers/wekacontainer/funcs_pod_ensure_test.go Updates tests for the new NewPodFactory signature.
internal/controllers/operations/load_drivers_test.go Updates tests for the new NewPodFactory signature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +131 to +138
var ff *domain.FeatureFlags
if container.Spec.Mode != weka.WekaContainerModeAdhocOpWC {
// Ad-hoc container is the mechanism for fetching flags — skip for it
ff, err = r.GetFeatureFlags(ctx)
if err != nil {
return errors.Wrap(err, "failed to get feature flags")
}
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feature flags are fetched for every container mode except AdhocOpWC, but they are currently only used to influence hugepages/--memory behavior. This introduces extra reconciliation work and can block pod creation while an ad-hoc feature-flags fetch runs. Consider fetching/passing feature flags only when needed (e.g., SSDProxy mode with 2Mi hugepages), otherwise pass nil to preserve the existing fast path.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feature flags are cached so that's not an issue

@kristina-solovyova kristina-solovyova force-pushed the 03-27-fix_include_ssdproxy_unaccounted_dpdk_memory_on_ssd_proxy_includes_dpdk_memory_feature_flag_op-272_ branch from 939c9c8 to d562610 Compare March 27, 2026 12:32
@weka weka deleted a comment from claude bot Mar 27, 2026
@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Claude finished @kristina-solovyova's task in 2m 9s —— View job


Code Review

  • Gather context from PR
  • Read changed files
  • Analyze code quality and correctness
  • Post review feedback

The implementation is clean and correct. A few notes below.


Correctness of the Copilot comments

Index numbering comment (feature_flags.go): Copilot suggested renaming // 9 to // 8. This is incorrect — the comment index is the bit position in the bitmap, not a sequential counter. ssd_proxy_includes_dpdk_memory lives at bit 9 in the spec, and the Python FeaturesFlags class confirms this. The current numbering is right; the "flag 8 is not yet added" comment correctly documents the gap.

Mode check (pod.go): Copilot claimed the SSDProxy mode guard was missing in GetHugePagesDetails. It's actually there:

if ff != nil && ff.SsdProxyIncludesDpdkMemory && container.Spec.Mode == weka.WekaContainerModeSSDProxy {

This comment is stale/wrong.

Feature flag fetching for all modes (funcs_pod_ensure.go): Copilot said flags are fetched for all non-adhoc modes. The actual implementation correctly scopes the fetch to SSDProxy only:

if container.Spec.Mode == weka.WekaContainerModeSSDProxy {
    ff, err = r.GetFeatureFlags(ctx)
    ...
}

All three Copilot comments are addressed or incorrect.


Issues Found

1. Inconsistent comment wording between Go and Python (minor)

feature_flags.go:15:

// flag 8 is not yet added

weka_runtime.py:292:

# flag 8 is not used by the operator

"Not yet added" implies a future addition, while "not used by the operator" is a factual statement about the current state. The Python wording is more accurate since flag 8 exists in the spec but the operator simply doesn't need it. Suggest aligning the Go comment to match: // flag 8 is not used by the operator. Fix this →

2. No unit tests for the new memory calculation path (moderate)

There are no test files in internal/controllers/resources/ covering GetHugePagesDetails. The new conditional branch (SsdProxyIncludesDpdkMemory == true → pass full hugepages, false → subtract offset) is untested. Given that a wrong value here directly affects pod resource limits and DPDK memory usage, a table-driven test covering the four cases (flag nil, flag false, flag true+SSDProxy, flag true+non-SSDProxy) would provide good protection. Fix this →


What's Good

  • Computing hgDetails once at the top of Create() and threading it through to setResources() is a clean refactor — avoids repeated computation and makes the flow explicit.
  • Removing the getHugePagesDetails() and getHugePagesOffset() private wrapper methods reduces indirection.
  • Nil-check on ff before accessing the flag makes backward compatibility explicit and safe (nil → old behavior).
  • Limiting GetFeatureFlags calls to SSDProxy mode only is the right performance call — no extra API calls for other container modes.
  • The defensive mode check in GetHugePagesDetails (container.Spec.Mode == weka.WekaContainerModeSSDProxy) is a good guard even though ff will only be non-nil for SSDProxy at current call sites.

Summary: Two concerns — a trivial comment wording inconsistency and the absence of unit tests for the new memory calculation logic. The core implementation is correct and the Copilot review comments are either already addressed or wrong.

Copilot AI review requested due to automatic review settings March 31, 2026 06:05
@kristina-solovyova kristina-solovyova force-pushed the 03-27-fix_include_ssdproxy_unaccounted_dpdk_memory_on_ssd_proxy_includes_dpdk_memory_feature_flag_op-272_ branch from d562610 to c186afd Compare March 31, 2026 06:05
@claude
Copy link
Copy Markdown

claude bot commented Mar 31, 2026

Claude finished @kristina-solovyova's task in 39s —— View job


No code changes since the previous review on 2026-03-27. The branch tip is still c186afd6.

Previous review findings remain open:

  1. Comment wording (feature_flags.go:15): // flag 8 is not yet added should be // flag 8 is not used by the operator to match weka_runtime.py.
  2. Missing unit tests: GetHugePagesDetails has no test coverage for the four cases (ff=nil, ff.SsdProxyIncludesDpdkMemory=false, true+SSDProxy, true+non-SSDProxy).

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +135 to +137
Hugepages: hugepagesMiB + config.Consts.SsdProxyDpdkMemoryMiB + config.Config.DriveSharing.SsdProxyHugepagesOffsetMiB,
HugepagesOffset: config.Config.DriveSharing.SsdProxyHugepagesOffsetMiB,
HugepagesSize: "2Mi",
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildProxyContainerSpec now sets Spec.Hugepages to include SsdProxyDpdkMemoryMiB and also introduces Spec.HugepagesOffset, but ensureProxyContainer() returns early when a proxy CR already exists. That means existing SSD proxy containers created by older operator versions (which likely lack HugepagesOffset and didn’t include the DPDK addend) won’t be updated, and the new GetHugePagesDetails() offset logic can compute an incorrect --memory value for those pods after upgrade. Consider patching the existing proxy container spec (Hugepages/HugepagesOffset) when it differs from the desired values, or deleting/recreating it safely.

Suggested change
Hugepages: hugepagesMiB + config.Consts.SsdProxyDpdkMemoryMiB + config.Config.DriveSharing.SsdProxyHugepagesOffsetMiB,
HugepagesOffset: config.Config.DriveSharing.SsdProxyHugepagesOffsetMiB,
HugepagesSize: "2Mi",
Hugepages: hugepagesMiB + config.Consts.SsdProxyDpdkMemoryMiB,
HugepagesOffset: config.Config.DriveSharing.SsdProxyHugepagesOffsetMiB,
HugepagesSize: "2Mi",

Copilot uses AI. Check for mistakes.
}
case weka.WekaContainerModeSSDProxy:
offset = config.Config.DriveSharing.SsdProxyHugepagesOffsetMiB
offset = config.Consts.SsdProxyHugepagesOffsetMiB
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For SSD proxy mode, the default offset now comes from config.Consts.SsdProxyHugepagesOffsetMiB instead of the configured Config.DriveSharing.SsdProxyHugepagesOffsetMiB. This prevents the SSD_PROXY_HUGEPAGES_OFFSET_MIB env/config from affecting SSD proxy containers that have Spec.HugepagesOffset == 0 (including legacy proxy CRs created before this PR). That can change the computed --memory behavior after upgrade. Consider using config.Config.DriveSharing.SsdProxyHugepagesOffsetMiB here (and only falling back to Consts as its default), or otherwise explicitly handling legacy SSD proxy containers.

Suggested change
offset = config.Consts.SsdProxyHugepagesOffsetMiB
// Prefer configured SSD proxy hugepages offset, fall back to const default
if cfgOffset := config.Config.DriveSharing.SsdProxyHugepagesOffsetMiB; cfgOffset > 0 {
offset = cfgOffset
} else {
offset = config.Consts.SsdProxyHugepagesOffsetMiB
}

Copilot uses AI. Check for mistakes.
nodeInfo *discovery.DiscoveryNodeInfo
container *weka.WekaContainer
nodeInfo *discovery.DiscoveryNodeInfo
featureFlags *domain.FeatureFlags // nil → old behavior (e.g. ad-hoc container)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field comment says featureFlags *domain.FeatureFlags // nil → old behavior (e.g. ad-hoc container), but for SSD proxy mode ff == nil is treated as SsdProxyIncludesDpdkMemory == false (and adds SsdProxyDpdkMemoryMiB to the offset), which is not necessarily the pre-change behavior for legacy SSD proxy specs. Suggest clarifying this comment to reflect the actual semantics (e.g., “nil means assume FF disabled / backward-compatible default”).

Suggested change
featureFlags *domain.FeatureFlags // nil → old behavior (e.g. ad-hoc container)
featureFlags *domain.FeatureFlags // nil → assume feature flags disabled / backward-compatible defaults (e.g. ad-hoc container)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all fine, as "nil means assume FF disabled"

// GetHugePagesDetails returns hugepages details for a container based on its spec.
func GetHugePagesDetails(container *weka.WekaContainer) HugePagesDetails {
// ff is optional; for SSD proxy containers it controls DPDK memory accounting:
// - without SsdProxyIncludesDpdkMemory: DPDK memory (ConstSsdProxyDpdkMemoryMiB) is excluded
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc comment refers to ConstSsdProxyDpdkMemoryMiB, but the code uses config.Consts.SsdProxyDpdkMemoryMiB. This mismatch makes it harder to grep/maintain the code; please align the comment with the actual constant/field name.

Suggested change
// - without SsdProxyIncludesDpdkMemory: DPDK memory (ConstSsdProxyDpdkMemoryMiB) is excluded
// - without SsdProxyIncludesDpdkMemory: DPDK memory (config.Consts.SsdProxyDpdkMemoryMiB) is excluded

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants