Skip to content

support HPA in vmagent#1980

Merged
vrutkovs merged 2 commits intomasterfrom
distributed-agent-hpa
Mar 20, 2026
Merged

support HPA in vmagent#1980
vrutkovs merged 2 commits intomasterfrom
distributed-agent-hpa

Conversation

@AndrewChubatiuk
Copy link
Contributor

@AndrewChubatiuk AndrewChubatiuk commented Mar 17, 2026

fixes #1961
fixes #1101


Summary by cubic

Adds HPA support for VMAgent and customizable StatefulSet rolling updates for VMAgent and VMDistributedZoneAgent. The operator now reconciles HorizontalPodAutoscaler for VMAgent, preserves HPA-controlled replicas on updates, aligns stateful HPA with shardCount, and removes HPA on delete; fixes #1961 and #1101.

  • New Features

    • spec.hpa (EmbeddedHPA) in VMAgentSpec and VMDistributedZoneAgentSpec; in stateful mode, HPA replicas map to spec.shardCount. Validation blocks hpa with daemonSetMode. HPA is created/updated and cleaned up on VMAgent delete.
    • spec.statefulRollingUpdateStrategyBehavior for VMAgent and VMDistributedZoneAgent (applies with OnDelete). Rolling-update behavior is respected across StatefulSets.
    • HPA-managed replicas are preserved on updates for Deployments/StatefulSets across VMAuth, VLCluster (insert/select/storage), VTCluster (insert/select/storage), and VMCluster.
  • Refactors

    • Switched shardCount to int32 across APIs, controllers, and tests (VMAgent, VMAnomaly); updated sharding helpers.
    • Reworked reconcile APIs: STSOptionsStatefulSetOpts; added DeploymentOpts with a PatchSpec hook to retain replicas when HPA controls scaling.

Written for commit d9d6b70. Summary will update on new commits.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 10 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="internal/controller/operator/factory/vmagent/vmagent.go">

<violation number="1" location="internal/controller/operator/factory/vmagent/vmagent.go:151">
P1: This adds an HPA, but vmagent reconciliation still overwrites `.spec.replicas`, so HPA scaling will be reverted on the next sync.</violation>

<violation number="2" location="internal/controller/operator/factory/vmagent/vmagent.go:1322">
P1: HPA targets the base vmagent name, which breaks sharded vmagent because the real Deployment/StatefulSet names are shard-suffixed.</violation>
</file>

<file name="config/crd/overlay/crd.descriptionless.yaml">

<violation number="1" location="config/crd/overlay/crd.descriptionless.yaml:4683">
P1: Require `hpa.maxReplicas` in this schema. As added, users can omit it, and the controller will build an invalid HPA with `maxReplicas: 0`.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@AndrewChubatiuk AndrewChubatiuk force-pushed the distributed-agent-hpa branch 5 times, most recently from 3c3cf20 to d4bf72f Compare March 17, 2026 19:37
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 issues found across 17 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="docs/CHANGELOG.md">

<violation number="1" location="docs/CHANGELOG.md:27">
P1: Custom agent: **Changelog Review Agent**

This changelog entry is missing the required user-centric before/now/impact explanation, so it does not meet the mandatory changelog structure.</violation>

<violation number="2" location="docs/CHANGELOG.md:27">
P2: This changelog entry describes a different feature than the HPA changes in this PR, so the release notes become misleading.</violation>
</file>

<file name="api/operator/v1alpha1/vmdistributed_types.go">

<violation number="1" location="api/operator/v1alpha1/vmdistributed_types.go:204">
P2: Validate `statefulRollingUpdateStrategyBehavior` for `VMDistributed` zone agents before reconciling. Right now invalid `maxUnavailable` values are admitted and only surface later as reconcile errors.</violation>
</file>

<file name="api/operator/v1beta1/vmagent_types.go">

<violation number="1" location="api/operator/v1beta1/vmagent_types.go:103">
P2: Validate `statefulRollingUpdateStrategyBehavior` when this field is set. Right now invalid or empty values are admitted and only surface as StatefulSet reconcile failures.</violation>
</file>

<file name="config/crd/overlay/crd.yaml">

<violation number="1">
P2: Require `maxUnavailable` in this new object, otherwise `statefulRollingUpdateStrategyBehavior: {}` is accepted by the CRD and later fails reconciliation with a nil `IntOrString`.</violation>
</file>

<file name="internal/controller/operator/factory/reconcile/statefulset.go">

<violation number="1" location="internal/controller/operator/factory/reconcile/statefulset.go:108">
P1: This refactor drops the built-in protection for HPA-managed StatefulSets. Callers without a custom `Patch` will overwrite the HPA's current replica count on the next reconcile.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@AndrewChubatiuk AndrewChubatiuk force-pushed the distributed-agent-hpa branch 9 times, most recently from b9a74a6 to 7630af5 Compare March 20, 2026 08:41
)

type DeploymentOpts struct {
PatchSpec func(existingSpec, newSpec *appsv1.DeploymentSpec)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add PatchMeta to get rid of owner *metav1.OwnerReference too?

Copy link
Contributor Author

@AndrewChubatiuk AndrewChubatiuk Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we need this, as owner is only used in mergeMeta, which then should become public and propagated to all places where reconcile.Deployment is invoked, maybe makes sense to add owner to opts struct to simplify signature, but other reconcile functions have it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, okay, lets keep it as is then

}
}
if err := reconcile.Deployment(ctx, rclient, lbDep, prevLB, false, &owner); err != nil {
if err := reconcile.Deployment(ctx, rclient, lbDep, prevLB, &owner, nil); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HPA for vmauth replicas maybe? I'm fine with a follow-up PR too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently there's no HPA in VM/VL/VTCluster

@AndrewChubatiuk
Copy link
Contributor Author

@vrutkovs
addressed your comments, could you please take a look again?

@vrutkovs vrutkovs merged commit a26dd60 into master Mar 20, 2026
6 checks passed
@vrutkovs vrutkovs deleted the distributed-agent-hpa branch March 20, 2026 13:49
AndrewChubatiuk added a commit that referenced this pull request Mar 20, 2026
AndrewChubatiuk added a commit that referenced this pull request Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VMDistributed: allow the enabling of HPA on vmagent vmagent: add HPA support

2 participants