fix: extend pod hugepages based on dpdk_base_memory_mb by assafgi · Pull Request #2399 · weka/weka-operator

assafgi · 2026-03-25T17:10:15Z

No description provided.

assafgi · 2026-03-25T17:10:38Z

fix: extend pod hugepages based on dpdk_base_memory_mb #2399 👈 (View in Graphite)
main

How to use the Graphite Merge Queue

Add the label main-merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has required the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

github-actions · 2026-03-25T17:11:57Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 3b81136.

Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

graphite-app · 2026-03-25T17:13:33Z

Graphite Automations

"Add anton/matt/sergey/kristina as reviwers on operator PRs" took an action on this PR • (03/25/26)

2 reviewers were added to this PR based on Anton Bykov's automation.

Copilot

Pull request overview

Adjusts the hugepages “offset” used to compute the MEMORY env var for Weka pods, aiming to better account for DPDK base memory needs on frontend containers.

Changes:

Update GetHugePagesOffset default logic to scale frontend offset with NumCores (64 MiB per core) instead of a fixed 200 MiB.
Keep the previous 200 MiB default offset for non-frontend containers in the default mode branch.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T17:15:35Z

internal/controllers/resources/pod.go

+				offset = 64 * container.Spec.NumCores // 64 is same as the value we put int dpdk_base_memory_mb in resources.json
+			} else {


The inline comment has a typo (“put int”). Also, the reference to dpdk_base_memory_mb in resources.json is hard to validate/trace from this codebase, and the raw 64 is an unexplained magic number. Please fix the typo and consider replacing 64 with a named constant (or config-driven value) and update the comment to point to the actual source of truth for this offset.

Copilot · 2026-03-25T17:15:35Z

internal/controllers/resources/pod.go

 		default:
-			offset = 200
+			if container.HasFrontend() {
+				offset = 64 * container.Spec.NumCores // 64 is same as the value we put int dpdk_base_memory_mb in resources.json


Typo in comment: “put int” should be “put in”.

Suggested change

offset = 64 * container.Spec.NumCores // 64 is same as the value we put int dpdk_base_memory_mb in resources.json

offset = 64 * container.Spec.NumCores // 64 is same as the value we put in dpdk_base_memory_mb in resources.json

Copilot · 2026-03-25T17:15:35Z

internal/controllers/resources/pod.go

+			if container.HasFrontend() {
+				offset = 64 * container.Spec.NumCores // 64 is same as the value we put int dpdk_base_memory_mb in resources.json
+			} else {
+				offset = 200
+			}


This new frontend-specific offset logic only runs when container.Spec.HugepagesOffset == 0. In the common WekaCluster/WekaClient flows, HugepagesOffset is always set (e.g., via template defaults), so this change may not actually affect the pods you’re trying to fix. Consider updating the default hugepages offset where it’s populated (templates/factory) or intentionally leaving HugepagesOffset unset so this function’s defaults take effect.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-29T16:59:36Z

internal/controllers/allocator/templates.go

+		numCores = template.Cores.Drive
+		dpdkBaseMemoryMbOverride = cluster.Spec.GetOverrides().DpdkBaseMemoryMb.Drive
 	case "compute":


dpdkBaseMemoryMbOverride is initialized to 64, but each role branch overwrites it with cluster.Spec.GetOverrides().DpdkBaseMemoryMb.<Role> even when that value is 0 (unset). That effectively disables the intended default and can under-allocate hugepages for DPDK. Keep the 64 default unless the override is >0 (e.g., use util.GetNonZeroOrDefault(..., 64) or an explicit if > 0 check) in every role branch.

Copilot · 2026-03-29T16:59:36Z

internal/controllers/allocator/templates.go

+	// Add DPDK memory to hugepages based on role and cores
+	hp.Hugepages += dpdkBaseMemoryMbOverride * numCores
+


GetContainerHugepages now adds DPDK memory (hp.Hugepages += ...) but there are no unit tests asserting the default (+64 MiB per core) behavior or that per-role DpdkBaseMemoryMb overrides are applied. Since templates_test.go already has extensive coverage for hugepages calculations, add cases that validate the DPDK increment for at least one role and for both default and overridden values.

Copilot · 2026-03-29T16:59:36Z

internal/controllers/resources/pod.go

+						{
+							Name:  "DPDK_BASE_MEMORY_MB",
+							Value: strconv.Itoa(GetDpdkBaseMemoryMb(f.container)),
+						},


DPDK_BASE_MEMORY_MB is sourced from GetDpdkBaseMemoryMb(f.container), but containers created via factory.NewWekaContainerForWekaCluster never populate container.Spec.Overrides.DpdkBaseMemoryMb (no assignments found), while hugepages allocation uses cluster.Spec.GetOverrides().DpdkBaseMemoryMb.<role>. As a result, a cluster-level override will change requested hugepages but the runtime will still see the default 64 MiB, causing config/hugepages mismatch. Propagate the selected per-role DPDK base memory from the cluster spec into each container spec (or otherwise derive it here) so the env var matches the allocator logic.

Copilot · 2026-03-29T16:59:37Z

internal/controllers/resources/pod.go

 		hugePagesK8sSuffix = "2Mi"
-		offset := GetHugePagesOffset(container)
-		wekaMemoryString = fmt.Sprintf("%dMiB", container.Spec.Hugepages-offset)
+		wekaMemoryString = fmt.Sprintf("%dMiB", container.Spec.Hugepages-baseOffset-dpdkMemory)


wekaMemoryString is now computed as Hugepages - baseOffset - dpdkMemory and can become negative if the hugepages spec/overrides are misconfigured (or if allocator and runtime DPDK values diverge). Consider guarding against negative results (e.g., clamp to 0 or return a validation error earlier) to avoid emitting an invalid MEMORY env var like -64MiB.

Suggested change

wekaMemoryString = fmt.Sprintf("%dMiB", container.Spec.Hugepages-baseOffset-dpdkMemory)

availableMemoryMiB := container.Spec.Hugepages - baseOffset - dpdkMemory

if availableMemoryMiB < 0 {

availableMemoryMiB = 0

}

wekaMemoryString = fmt.Sprintf("%dMiB", availableMemoryMiB)

Copilot · 2026-03-29T16:59:37Z

internal/controllers/wekaclient/client_reconciler_loop.go

+	if c.wekaClient.Spec.Overrides.DpdkBaseMemoryMb != 0 {
+		dpdkBaseMemoryMb = c.wekaClient.Spec.Overrides.DpdkBaseMemoryMb


This code reads c.wekaClient.Spec.Overrides.DpdkBaseMemoryMb directly, while other override accesses in this repo consistently go through Spec.GetOverrides() (e.g., wekaClient.Spec.GetOverrides().DriversBuildId) to apply defaults / avoid nil override structs. Consider using c.wekaClient.Spec.GetOverrides().DpdkBaseMemoryMb here for consistency and safety.

Suggested change

if c.wekaClient.Spec.Overrides.DpdkBaseMemoryMb != 0 {

dpdkBaseMemoryMb = c.wekaClient.Spec.Overrides.DpdkBaseMemoryMb

if c.wekaClient.Spec.GetOverrides().DpdkBaseMemoryMb != 0 {

dpdkBaseMemoryMb = c.wekaClient.Spec.GetOverrides().DpdkBaseMemoryMb

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T08:06:59Z

internal/controllers/allocator/templates.go

+		numCores = template.Cores.Drive
+		dpdkBaseMemoryMb = cluster.Spec.GetOverrides().DpdkBaseMemoryMb.Drive
 	case "compute":


This overwrites the dpdkBaseMemoryMb default (64) with the override value even when the override is unset (0), which effectively disables the intended default. Only override the default when the per-role override is non-zero (e.g., use a non-zero-or-default helper or an explicit if != 0 check).

Copilot · 2026-03-30T08:06:59Z

internal/controllers/allocator/templates.go

+		numCores = template.Cores.Compute
+		dpdkBaseMemoryMb = cluster.Spec.GetOverrides().DpdkBaseMemoryMb.Compute
 	case "s3":


This overwrites the dpdkBaseMemoryMb default (64) with the override value even when the override is unset (0), which effectively disables the intended default. Only override the default when the per-role override is non-zero (e.g., use a non-zero-or-default helper or an explicit if != 0 check).

Copilot · 2026-03-30T08:06:59Z

internal/controllers/allocator/templates.go

+		numCores = template.Cores.S3
+		dpdkBaseMemoryMb = cluster.Spec.GetOverrides().DpdkBaseMemoryMb.S3
 	case "nfs":


This overwrites the dpdkBaseMemoryMb default (64) with the override value even when the override is unset (0), which effectively disables the intended default. Only override the default when the per-role override is non-zero (e.g., use a non-zero-or-default helper or an explicit if != 0 check).

Copilot · 2026-03-30T08:06:59Z

internal/controllers/allocator/templates.go

+		numCores = template.Cores.Nfs
+		dpdkBaseMemoryMb = cluster.Spec.GetOverrides().DpdkBaseMemoryMb.Nfs
 	case "smbw":


This overwrites the dpdkBaseMemoryMb default (64) with the override value even when the override is unset (0), which effectively disables the intended default. Only override the default when the per-role override is non-zero (e.g., use a non-zero-or-default helper or an explicit if != 0 check).

Copilot · 2026-03-30T08:07:00Z

internal/controllers/wekaclient/client_reconciler_loop.go

+	dpdkBaseMemoryMb := 64
+	if c.wekaClient.Spec.Overrides.DpdkBaseMemoryMb != 0 {
+		dpdkBaseMemoryMb = c.wekaClient.Spec.Overrides.DpdkBaseMemoryMb
+	}
+	return c.getClientCores() * (1500 + dpdkBaseMemoryMb)


This function increases hugepages by dpdkBaseMemoryMb per core, but the client WekaContainer spec created in this reconciler does not persist the effective DPDK base memory anywhere (e.g., WekaContainerSpec.DpdkBaseMemoryMb). Downstream code (pod env vars and MEMORY calculation) currently reads container.Spec.DpdkBaseMemoryMb, which will remain 0 and diverge from the hugepages sizing here. Persist the chosen DPDK base memory into the container spec so the runtime and MEMORY computation stay consistent with the hugepages reservation.

Copilot · 2026-03-30T08:07:00Z

internal/controllers/allocator/templates.go

+		numCores = template.Cores.Smbw
+		dpdkBaseMemoryMb = cluster.Spec.GetOverrides().DpdkBaseMemoryMb.Smbw
 	case "data-services":


This overwrites the dpdkBaseMemoryMb default (64) with the override value even when the override is unset (0), which effectively disables the intended default. Only override the default when the per-role override is non-zero (e.g., use a non-zero-or-default helper or an explicit if != 0 check).

Copilot · 2026-03-30T08:07:00Z

internal/controllers/allocator/templates.go

+		numCores = template.Cores.DataServices
+		dpdkBaseMemoryMb = cluster.Spec.GetOverrides().DpdkBaseMemoryMb.DataServices
 	}


This overwrites the dpdkBaseMemoryMb default (64) with the override value even when the override is unset (0), which effectively disables the intended default. Only override the default when the per-role override is non-zero (e.g., use a non-zero-or-default helper or an explicit if != 0 check).

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T09:01:36Z

internal/controllers/wekaclient/client_reconciler_loop.go

+	}
+	if container.Spec.DpdkBaseMemoryMb != dpdkBaseMemoryMb {
+		container.Spec.DpdkBaseMemoryMb = dpdkBaseMemoryMb
+		changed = true


When DpdkBaseMemoryMb changes, the client container's Hugepages may also need to be updated to keep total hugepages in sync with the computed formula (since DPDK memory is part of the default hugepages sizing). Right now only container.Spec.DpdkBaseMemoryMb is updated, which can leave an existing container under-provisioned for hugepages. Consider recomputing/updating container.Spec.Hugepages as well when DpdkBaseMemoryMb changes (at least when wekaClient.Spec.HugePages == 0 so hugepages are operator-computed).

Suggested change

changed = true

changed = true

// When hugepages are operator-computed (wekaClient.Spec.HugePages == 0),

// a change in DPDK base memory should also be reflected in the container's

// hugepages allocation, since the sizing formula includes DPDK memory.

if c.wekaClient.Spec.HugePages == 0 {

if container.Spec.Hugepages != newClientSpec.Hugepages {

container.Spec.Hugepages = newClientSpec.Hugepages

changed = true

}

}

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

internal/controllers/resources/pod.go:1065

The comment says this function includes DPDK base memory, but GetHugePagesOffset currently only returns container.Spec.HugepagesOffset (or a mode-based default) and does not incorporate container.Spec.DpdkBaseMemoryMb. Either update the implementation to add the DPDK portion (e.g., DpdkBaseMemoryMb * NumCores where applicable) or adjust the comment to match the actual behavior.

// GetHugePagesOffset returns the hugepages offset for a container (for Weka allocation), including DPDK base memory.
func GetHugePagesOffset(container *weka.WekaContainer) int {
	offset := container.Spec.HugepagesOffset
	// get default if not set
	if offset == 0 {
		switch container.Spec.Mode {
		case weka.WekaContainerModeDrive:
			if container.UsesDriveSharing() {
				offset = 200 * container.Spec.NumCores
			} else {
				offset = 200 * container.Spec.NumDrives
			}
		case weka.WekaContainerModeSSDProxy:
			offset = config.Config.DriveSharing.SsdProxyHugepagesOffsetMiB
		default:
			offset = 200
		}
	}
	return offset

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T12:56:13Z

internal/controllers/allocator/templates.go

+	// Add DPDK base memory to both hugepages and offset
+	dpdkTotalMemory := dpdkBaseMemoryMb * numCores
+	hp.Hugepages += dpdkTotalMemory
+	hp.HugepagesOffset += dpdkTotalMemory


DPDK base memory is now added to both Hugepages and HugepagesOffset, but the unit tests in templates_test.go only assert Hugepages. Add assertions for HugepagesOffset to ensure the offset adjustment remains correct and to prevent regressions in MEMORY sizing (Hugepages - offset).

Suggested change

// Add DPDK base memory to both hugepages and offset

dpdkTotalMemory := dpdkBaseMemoryMb * numCores

hp.Hugepages += dpdkTotalMemory

hp.HugepagesOffset += dpdkTotalMemory

// Add DPDK base memory to hugepages so that MEMORY (Hugepages - HugepagesOffset) includes it

dpdkTotalMemory := dpdkBaseMemoryMb * numCores

hp.Hugepages += dpdkTotalMemory

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

internal/controllers/resources/pod.go:1056

The doc comment says GetHugePagesOffset includes DPDK base memory, but the function currently only returns container.Spec.HugepagesOffset (or the mode-based default) and does not reference DpdkBaseMemoryMb. Either update the comment to reflect the actual behavior (offset is already expected to include any DPDK adjustment in Spec.HugepagesOffset), or incorporate DPDK into this helper to match the comment.

// GetHugePagesOffset returns the hugepages offset for a container (for Weka allocation), including DPDK base memory.
func GetHugePagesOffset(container *weka.WekaContainer) int {
	offset := container.Spec.HugepagesOffset
	// get default if not set
	if offset == 0 {
		switch container.Spec.Mode {
		case weka.WekaContainerModeDrive:
			if container.UsesDriveSharing() {
				offset = 200 * container.Spec.NumCores
			} else {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

assafgi marked this pull request as ready for review March 25, 2026 17:10

assafgi requested a review from a team as a code owner March 25, 2026 17:10

Copilot AI review requested due to automatic review settings March 25, 2026 17:10

Copilot started reviewing on behalf of assafgi March 25, 2026 17:10 View session

graphite-app bot requested review from kristina-solovyova and tigrawap March 25, 2026 17:11

kristina-solovyova approved these changes Mar 25, 2026

View reviewed changes

Copilot AI reviewed Mar 25, 2026

View reviewed changes

assafgi force-pushed the 03-25-fix_extend_pod_hugepages_based_on_dpdk_base_memory_mb branch from 2e1f2f0 to b63609b Compare March 26, 2026 15:35

Copilot AI review requested due to automatic review settings March 29, 2026 16:53

assafgi force-pushed the 03-25-fix_extend_pod_hugepages_based_on_dpdk_base_memory_mb branch from b63609b to 429633e Compare March 29, 2026 16:53

Copilot started reviewing on behalf of assafgi March 29, 2026 16:53 View session

Copilot AI reviewed Mar 29, 2026

View reviewed changes

assafgi force-pushed the 03-25-fix_extend_pod_hugepages_based_on_dpdk_base_memory_mb branch from 429633e to 4910260 Compare March 29, 2026 18:05

Copilot AI review requested due to automatic review settings March 30, 2026 07:57

assafgi force-pushed the 03-25-fix_extend_pod_hugepages_based_on_dpdk_base_memory_mb branch from 4910260 to 0c2a623 Compare March 30, 2026 07:57

Copilot started reviewing on behalf of assafgi March 30, 2026 07:57 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

assafgi force-pushed the 03-25-fix_extend_pod_hugepages_based_on_dpdk_base_memory_mb branch from 0c2a623 to 33cc9cb Compare March 30, 2026 08:08

Copilot AI review requested due to automatic review settings March 30, 2026 08:52

assafgi force-pushed the 03-25-fix_extend_pod_hugepages_based_on_dpdk_base_memory_mb branch from 33cc9cb to 677ac07 Compare March 30, 2026 08:52

Copilot started reviewing on behalf of assafgi March 30, 2026 08:53 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

assafgi force-pushed the 03-25-fix_extend_pod_hugepages_based_on_dpdk_base_memory_mb branch from 677ac07 to 8587d07 Compare March 30, 2026 11:12

Copilot AI review requested due to automatic review settings March 30, 2026 12:50

assafgi force-pushed the 03-25-fix_extend_pod_hugepages_based_on_dpdk_base_memory_mb branch from 8587d07 to ec50516 Compare March 30, 2026 12:50

Copilot started reviewing on behalf of assafgi March 30, 2026 12:50 View session

assafgi force-pushed the 03-25-fix_extend_pod_hugepages_based_on_dpdk_base_memory_mb branch from ec50516 to b781661 Compare March 30, 2026 12:55

Copilot AI reviewed Mar 30, 2026

View reviewed changes

assafgi force-pushed the 03-25-fix_extend_pod_hugepages_based_on_dpdk_base_memory_mb branch from b781661 to ebec559 Compare March 30, 2026 16:31

kristina-solovyova added the claude-review Force Claude review label Mar 30, 2026 — with Graphite App

fix: extend pod hugepages based on dpdk_base_memory_mb

3b81136

Copilot AI review requested due to automatic review settings March 30, 2026 18:02

assafgi force-pushed the 03-25-fix_extend_pod_hugepages_based_on_dpdk_base_memory_mb branch from ebec559 to 3b81136 Compare March 30, 2026 18:02

Copilot started reviewing on behalf of assafgi March 30, 2026 18:02 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

		offset = 64 * container.Spec.NumCores // 64 is same as the value we put int dpdk_base_memory_mb in resources.json
		} else {

		// Add DPDK memory to hugepages based on role and cores
		hp.Hugepages += dpdkBaseMemoryMbOverride * numCores

-		wekaMemoryString = fmt.Sprintf("%dMiB", container.Spec.Hugepages-baseOffset-dpdkMemory)
+		availableMemoryMiB := container.Spec.Hugepages - baseOffset - dpdkMemory
+		if availableMemoryMiB < 0 {
+			availableMemoryMiB = 0
+		}
+		wekaMemoryString = fmt.Sprintf("%dMiB", availableMemoryMiB)

		if c.wekaClient.Spec.Overrides.DpdkBaseMemoryMb != 0 {
		dpdkBaseMemoryMb = c.wekaClient.Spec.Overrides.DpdkBaseMemoryMb

-		changed = true
+		changed = true
+		// When hugepages are operator-computed (wekaClient.Spec.HugePages == 0),
+		// a change in DPDK base memory should also be reflected in the container's
+		// hugepages allocation, since the sizing formula includes DPDK memory.
+		if c.wekaClient.Spec.HugePages == 0 {
+			if container.Spec.Hugepages != newClientSpec.Hugepages {
+				container.Spec.Hugepages = newClientSpec.Hugepages
+				changed = true
+			}
+		}

Conversation

assafgi commented Mar 25, 2026

Uh oh!

assafgi commented Mar 25, 2026

How to use the Graphite Merge Queue

Uh oh!

github-actions bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Snapshot Warnings

Scanned Files

Uh oh!

graphite-app bot commented Mar 25, 2026

Graphite Automations

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

github-actions bot commented Mar 25, 2026 •

edited

Loading