Skip to content

PR for the next 0.68.x release#1979

Open
AndrewChubatiuk wants to merge 5 commits intorelease-0.68from
release-0.68-next-release
Open

PR for the next 0.68.x release#1979
AndrewChubatiuk wants to merge 5 commits intorelease-0.68from
release-0.68-next-release

Conversation

@AndrewChubatiuk
Copy link
Contributor

@AndrewChubatiuk AndrewChubatiuk commented Mar 17, 2026

PR for next 0.68.x release


Summary by cubic

Make PVC handling safer by waiting for claims to bind and fully resize before proceeding. Refactored probes and security into CommonAppsParams, added Proxy Protocol–aware health checks, unified readiness timeouts via base config, and preserved HPA-controlled replicas during reconciles.

  • New Features

    • Added PVC readiness env vars: VM_PVC_WAIT_READY_INTERVAL, VM_PVC_WAIT_READY_TIMEOUT; added VM CR status polling: VM_WAIT_READY_INTERVAL; documented in docs/env.md.
    • Introduced per-app default TerminationGracePeriodSeconds via env (e.g. VM_VLOGSDEFAULT_TERMINATION_GRACE_PERIOD_SECONDS).
    • Consolidated deployment, probe, and security settings into CommonAppsParams across VM/VL/VT CRDs; probes respect httpListenAddr.useProxyProtocol.
  • Bug Fixes

    • PVC reconcile and StatefulSet PVC expansion now wait for PVCs to be Bound, capacity >= requested, and not Resizing; added conflict-retry and post-create wait.
    • Preserve HPA-managed replicas for Deployments/StatefulSets during reconcile; clean up HPA on VMAgent delete.
    • Shared probe builder fixes Proxy Protocol health checks for all components; added tests (including Proxy Protocol parsing).
    • Standardized wait timeouts via reconcile.Init (uses base config); normalized Service defaults to avoid unnecessary updates.
    • Prevent infinite rolling updates by forcing maxUnavailable=1 when set to 0 and replicas=1; added coverage.
    • Aligned shardCount to int32 in APIs/CRDs to match Kubernetes expectations.

Written for commit 0f3a5af. Summary will update on new commits.

@AndrewChubatiuk AndrewChubatiuk changed the base branch from master to release-0.68 March 17, 2026 13:37
@AndrewChubatiuk AndrewChubatiuk changed the title Release 0.68 next release PR for next 0.68.x release Mar 17, 2026
@AndrewChubatiuk AndrewChubatiuk changed the title PR for next 0.68.x release PR for the next 0.68.x release Mar 17, 2026
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 15 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="docs/env.md">

<violation number="1" location="docs/env.md:237">
P3: Narrow this description: `VM_WAIT_READY_INTERVAL` does not apply to all VM CRs, only to the `waitForStatus` loop used for VMAgent, VMCluster, and VMAuth.</violation>
</file>

<file name="internal/controller/operator/factory/reconcile/statefulset_pvc_expand_test.go">

<violation number="1" location="internal/controller/operator/factory/reconcile/statefulset_pvc_expand_test.go:163">
P2: This helper switch makes PVC expansion tests auto-complete by mutating `Status.Capacity` during `Update`, so the resize/wait path is no longer tested realistically.</violation>
</file>

<file name="internal/controller/operator/factory/reconcile/statefulset_pvc_expand.go">

<violation number="1" location="internal/controller/operator/factory/reconcile/statefulset_pvc_expand.go:127">
P1: This wait uses the pre-update PVC size, so resized claims can be treated as ready before expansion has completed.</violation>
</file>

<file name="internal/controller/operator/factory/reconcile/pvc.go">

<violation number="1" location="internal/controller/operator/factory/reconcile/pvc.go:67">
P2: `waitForPVCReady` treats an unprovisioned PVC as ready by returning success when `status.capacity` is empty. Keep polling instead, otherwise new PVCs bypass the new readiness wait entirely.</violation>
</file>

<file name="internal/controller/operator/factory/k8stools/interceptors.go">

<violation number="1" location="internal/controller/operator/factory/k8stools/interceptors.go:49">
P2: Creating VMAuth/VMCluster/VMAgent no longer persists the mocked status, so tests that read them back after `Create` will see empty status.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

| VM_PODWAITREADYTIMEOUT: `80s` <a href="#variables-vm-podwaitreadytimeout" id="variables-vm-podwaitreadytimeout">#</a><br>Defines single pod deadline to wait for transition to ready state |
| VM_PVC_WAIT_READY_INTERVAL: `5s` <a href="#variables-vm-pvc-wait-ready-interval" id="variables-vm-pvc-wait-ready-interval">#</a><br>Defines poll interval for PVC ready check |
| VM_PVC_WAIT_READY_TIMEOUT: `80s` <a href="#variables-vm-pvc-wait-ready-timeout" id="variables-vm-pvc-wait-ready-timeout">#</a><br>Defines poll timeout for PVC ready check |
| VM_WAIT_READY_INTERVAL: `5s` <a href="#variables-vm-wait-ready-interval" id="variables-vm-wait-ready-interval">#</a><br>Defines poll interval for VM CRs |
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Narrow this description: VM_WAIT_READY_INTERVAL does not apply to all VM CRs, only to the waitForStatus loop used for VMAgent, VMCluster, and VMAuth.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs/env.md, line 237:

<comment>Narrow this description: `VM_WAIT_READY_INTERVAL` does not apply to all VM CRs, only to the `waitForStatus` loop used for VMAgent, VMCluster, and VMAuth.</comment>

<file context>
@@ -230,7 +230,10 @@
+| VM_PODWAITREADYTIMEOUT: `80s` <a href="#variables-vm-podwaitreadytimeout" id="variables-vm-podwaitreadytimeout">#</a><br>Defines single pod deadline to wait for transition to ready state |
+| VM_PVC_WAIT_READY_INTERVAL: `5s` <a href="#variables-vm-pvc-wait-ready-interval" id="variables-vm-pvc-wait-ready-interval">#</a><br>Defines poll interval for PVC ready check |
+| VM_PVC_WAIT_READY_TIMEOUT: `80s` <a href="#variables-vm-pvc-wait-ready-timeout" id="variables-vm-pvc-wait-ready-timeout">#</a><br>Defines poll timeout for PVC ready check |
+| VM_WAIT_READY_INTERVAL: `5s` <a href="#variables-vm-wait-ready-interval" id="variables-vm-wait-ready-interval">#</a><br>Defines poll interval for VM CRs |
 | VM_FORCERESYNCINTERVAL: `60s` <a href="#variables-vm-forceresyncinterval" id="variables-vm-forceresyncinterval">#</a><br>configures force resync interval for VMAgent, VMAlert, VMAlertmanager and VMAuth. |
 | VM_ENABLESTRICTSECURITY: `false` <a href="#variables-vm-enablestrictsecurity" id="variables-vm-enablestrictsecurity">#</a><br>EnableStrictSecurity will add default `securityContext` to pods and containers created by operator Default PodSecurityContext include: 1. RunAsNonRoot: true 2. RunAsUser/RunAsGroup/FSGroup: 65534 '65534' refers to 'nobody' in all the used default images like alpine, busybox. If you're using customize image, please make sure '65534' is a valid uid in there or specify SecurityContext. 3. FSGroupChangePolicy: &onRootMismatch If KubeVersion>=1.20, use `FSGroupChangePolicy="onRootMismatch"` to skip the recursive permission change when the root of the volume already has the correct permissions 4. SeccompProfile:      type: RuntimeDefault Use `RuntimeDefault` seccomp profile by default, which is defined by the container runtime, instead of using the Unconfined (seccomp disabled) mode. Default container SecurityContext include: 1. AllowPrivilegeEscalation: false 2. ReadOnlyRootFilesystem: true 3. Capabilities:      drop:        - all turn off `EnableStrictSecurity` by default, see https://github.com/VictoriaMetrics/operator/issues/749 for details |
</file context>
Suggested change
| VM_WAIT_READY_INTERVAL: `5s` <a href="#variables-vm-wait-ready-interval" id="variables-vm-wait-ready-interval">#</a><br>Defines poll interval for VM CRs |
| VM_WAIT_READY_INTERVAL: `5s` <a href="#variables-vm-wait-ready-interval" id="variables-vm-wait-ready-interval">#</a><br>Defines poll interval for status checks of VMAgent, VMCluster and VMAuth CRs |
Fix with Cubic

@AndrewChubatiuk AndrewChubatiuk force-pushed the release-0.68-next-release branch from 6afecdc to 1efa39b Compare March 17, 2026 14:13
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 8 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="test/e2e/vmsingle_test.go">

<violation number="1" location="test/e2e/vmsingle_test.go:514">
P2: The new proxy-protocol e2e case has an empty verify block, so it doesn't actually validate that `httpListenAddr.useProxyProtocol=true` is applied.</violation>
</file>

<file name="test/e2e/vlsingle_test.go">

<violation number="1" location="test/e2e/vlsingle_test.go:177">
P2: This e2e case doesn't assert anything about proxy protocol, so it will pass even if the operator ignores `httpListenAddr.useProxyProtocol`.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

"httpListenAddr.useProxyProtocol": "true",
}
},
verify: func(cr *vmv1beta1.VMSingle) {},
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The new proxy-protocol e2e case has an empty verify block, so it doesn't actually validate that httpListenAddr.useProxyProtocol=true is applied.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At test/e2e/vmsingle_test.go, line 514:

<comment>The new proxy-protocol e2e case has an empty verify block, so it doesn't actually validate that `httpListenAddr.useProxyProtocol=true` is applied.</comment>

<file context>
@@ -503,6 +503,17 @@ var _ = Describe("test vmsingle Controller", Label("vm", "single"), func() {
+								"httpListenAddr.useProxyProtocol": "true",
+							}
+						},
+						verify: func(cr *vmv1beta1.VMSingle) {},
+					},
+				),
</file context>
Suggested change
verify: func(cr *vmv1beta1.VMSingle) {},
verify: func(cr *vmv1beta1.VMSingle) {
var createdDeploy appsv1.Deployment
Expect(k8sClient.Get(ctx, types.NamespacedName{Namespace: namespace, Name: cr.PrefixedName()}, &createdDeploy)).ToNot(HaveOccurred())
Expect(createdDeploy.Spec.Template.Spec.Containers).ToNot(BeEmpty())
Expect(createdDeploy.Spec.Template.Spec.Containers[0].Args).To(ContainElement("-httpListenAddr.useProxyProtocol=true"))
},
Fix with Cubic

RetentionPeriod: "1",
},
},
func(cr *vmv1.VLSingle) {},
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: This e2e case doesn't assert anything about proxy protocol, so it will pass even if the operator ignores httpListenAddr.useProxyProtocol.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At test/e2e/vlsingle_test.go, line 177:

<comment>This e2e case doesn't assert anything about proxy protocol, so it will pass even if the operator ignores `httpListenAddr.useProxyProtocol`.</comment>

<file context>
@@ -159,6 +159,23 @@ var _ = Describe("test vlsingle Controller", Label("vl", "single", "vlsingle"),
+							RetentionPeriod: "1",
+						},
+					},
+					func(cr *vmv1.VLSingle) {},
+				),
 			)
</file context>
Suggested change
func(cr *vmv1.VLSingle) {},
func(cr *vmv1.VLSingle) {
createdChildObjects := types.NamespacedName{Namespace: namespace, Name: cr.PrefixedName()}
var createdDeploy appsv1.Deployment
Expect(k8sClient.Get(ctx, createdChildObjects, &createdDeploy)).ToNot(HaveOccurred())
Expect(createdDeploy.Spec.Template.Spec.Containers).To(HaveLen(1))
Expect(createdDeploy.Spec.Template.Spec.Containers[0].Args).To(ContainElement("-httpListenAddr.useProxyProtocol=true"))
},
Fix with Cubic

Ensure that we configure correct healthchecks for components with proxy protocol enabled
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 101 files (changes from recent commits).

Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed.

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="go.mod">

<violation number="1" location="go.mod:129">
P2: Avoid replacing a core dependency with a personal fork in release code; this creates supply-chain and long-term maintenance risk for reproducible builds.</violation>
</file>

<file name="internal/controller/operator/factory/vmalertmanager/vmalertmanager_reconcile_test.go">

<violation number="1" location="internal/controller/operator/factory/vmalertmanager/vmalertmanager_reconcile_test.go:181">
P2: This does not simulate a status-only change; it switches the test into last-applied-spec reconciliation instead.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.


replace github.com/VictoriaMetrics/operator/api => ./api

replace github.com/caarlos0/env/v11 => github.com/AndrewChubatiuk/env/v11 v11.0.0-20260302065400-14d0354881b6
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Avoid replacing a core dependency with a personal fork in release code; this creates supply-chain and long-term maintenance risk for reproducible builds.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At go.mod, line 129:

<comment>Avoid replacing a core dependency with a personal fork in release code; this creates supply-chain and long-term maintenance risk for reproducible builds.</comment>

<file context>
@@ -125,3 +125,5 @@ require (
 
 replace github.com/VictoriaMetrics/operator/api => ./api
+
+replace github.com/caarlos0/env/v11 => github.com/AndrewChubatiuk/env/v11 v11.0.0-20260302065400-14d0354881b6
</file context>
Fix with Cubic

Co-authored-by: Vadim Rutkovsky <vadim@vrutkovs.eu>
@AndrewChubatiuk AndrewChubatiuk force-pushed the release-0.68-next-release branch from 9675f15 to de3c010 Compare March 18, 2026 23:24
@AndrewChubatiuk AndrewChubatiuk force-pushed the release-0.68-next-release branch from e0ff1f4 to 4b75b15 Compare March 20, 2026 07:04
@AndrewChubatiuk AndrewChubatiuk force-pushed the release-0.68-next-release branch from 7b3a399 to 0f3a5af Compare March 20, 2026 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants