Skip to content

feat(helm): add metrics endpoint and ServiceMonitor support#1435

Open
optimus-fulcria wants to merge 5 commits intokagent-dev:mainfrom
optimus-fulcria:feat/helm-metrics-servicemonitor
Open

feat(helm): add metrics endpoint and ServiceMonitor support#1435
optimus-fulcria wants to merge 5 commits intokagent-dev:mainfrom
optimus-fulcria:feat/helm-metrics-servicemonitor

Conversation

@optimus-fulcria
Copy link

Summary

Closes #1369

Adds first-class Helm values for enabling the controller metrics endpoint and optionally deploying a Prometheus Operator ServiceMonitor.

Changes

  • values.yaml: New controller.metrics section with enabled, port, secure, and serviceMonitor sub-values
  • controller-configmap.yaml: Sets METRICS_BIND_ADDRESS and METRICS_SECURE env vars when metrics are enabled
  • controller-deployment.yaml: Adds metrics container port when enabled
  • controller-service.yaml: Adds metrics service port when enabled
  • controller-servicemonitor.yaml (new): Optional ServiceMonitor with configurable labels, interval, scrapeTimeout, and relabeling

Usage

controller:
  metrics:
    enabled: true
    port: 9093
    secure: false
    serviceMonitor:
      enabled: true
      additionalLabels:
        release: prometheus
      interval: 30s
      scrapeTimeout: 10s

Test plan

  • Helm unit tests added for deployment, service, and ServiceMonitor templates
  • Verified default values don't change existing behavior (metrics disabled by default)
  • Deploy to Kind cluster with metrics enabled and verify Prometheus scraping

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 5, 2026 11:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class Helm configuration for exposing the controller’s Prometheus metrics endpoint and optionally deploying a Prometheus Operator ServiceMonitor to scrape it (closes #1369).

Changes:

  • Introduces controller.metrics values (enablement, port, secure mode, and serviceMonitor config).
  • Updates controller Deployment/Service/ConfigMap templates to expose the metrics port and set METRICS_* env vars when enabled.
  • Adds a new controller-servicemonitor.yaml template plus Helm unit tests for the new behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
helm/kagent/values.yaml Adds controller.metrics and controller.metrics.serviceMonitor values to configure metrics + ServiceMonitor.
helm/kagent/templates/controller-configmap.yaml Conditionally sets METRICS_BIND_ADDRESS / METRICS_SECURE when metrics are enabled.
helm/kagent/templates/controller-deployment.yaml Conditionally exposes a metrics container port.
helm/kagent/templates/controller-service.yaml Conditionally exposes a metrics Service port.
helm/kagent/templates/controller-servicemonitor.yaml Adds optional ServiceMonitor resource for Prometheus Operator scraping.
helm/kagent/tests/controller-deployment_test.yaml Adds unit tests for deployment port + configmap env vars behavior.
helm/kagent/tests/controller-service_test.yaml Adds unit tests for conditional Service metrics port exposure.
helm/kagent/tests/controller-servicemonitor_test.yaml Adds unit tests for ServiceMonitor rendering and configuration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,33 @@
{{- if and .Values.controller.metrics.enabled .Values.controller.metrics.serviceMonitor.enabled }}
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ServiceMonitor rendering is only gated by values. This chart already guards optional CRD-based resources with .Capabilities.APIVersions.Has (e.g., openshift-route.yaml) to avoid install/upgrade failures when the CRD is not installed. Consider also gating this template on .Capabilities.APIVersions.Has "monitoring.coreos.com/v1" so controller.metrics.serviceMonitor.enabled=true doesn’t hard-fail on clusters without Prometheus Operator CRDs.

Suggested change
{{- if and .Values.controller.metrics.enabled .Values.controller.metrics.serviceMonitor.enabled }}
{{- if and .Values.controller.metrics.enabled .Values.controller.metrics.serviceMonitor.enabled (.Capabilities.APIVersions.Has "monitoring.coreos.com/v1") }}

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +24
insecureSkipVerify: true
{{- end }}
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When controller.metrics.secure is true, the controller enables controller-runtime’s authn/authz filter on the metrics endpoint. This ServiceMonitor config sets scheme: https but does not configure any bearer token/auth settings, so Prometheus scrapes are likely to be rejected. Add an endpoint auth configuration (e.g., bearerTokenFile) when secure metrics are enabled (and/or allow configuring the full endpoint auth block via values).

Suggested change
insecureSkipVerify: true
{{- end }}
insecureSkipVerify: true
{{- with .Values.controller.metrics.serviceMonitor.auth }}
{{- toYaml . | nindent 6 }}
{{- else }}
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
{{- end }}
{{- end }}

Copilot uses AI. Check for mistakes.
optimus-fulcria and others added 2 commits March 5, 2026 21:02
…rviceMonitor

Closes kagent-dev#1369

Add first-class Helm values for metrics configuration:
- controller.metrics.enabled: enables the metrics endpoint
- controller.metrics.port: configurable metrics port (default 9093)
- controller.metrics.secure: HTTPS vs HTTP toggle
- controller.metrics.serviceMonitor.*: Prometheus Operator ServiceMonitor

When enabled, the chart:
- Sets METRICS_BIND_ADDRESS and METRICS_SECURE env vars in the configmap
- Adds a metrics port to the controller deployment and service
- Optionally creates a ServiceMonitor with configurable labels, interval,
  scrapeTimeout, and relabeling configs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Optimus (AI Agent) <agent@fulcria.com>
…metrics

Address review feedback:
- Gate ServiceMonitor rendering on .Capabilities.APIVersions.Has to avoid
  failures on clusters without Prometheus Operator CRDs
- Add bearerTokenFile for service account auth when secure metrics enabled
- Add test for CRD unavailability scenario
- Add test for bearerTokenFile in secure mode

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Optimus (AI Agent) <agent@fulcria.com>
@optimus-fulcria optimus-fulcria force-pushed the feat/helm-metrics-servicemonitor branch from 71c2bf6 to 686d15f Compare March 5, 2026 21:03
Comment on lines +15 to +20
{{- if .Values.controller.metrics.enabled }}
- port: {{ .Values.controller.metrics.port }}
targetPort: {{ .Values.controller.metrics.port }}
protocol: TCP
name: metrics
{{- end }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to expose these on the service? The controller service can potentially be a LoadBalancer because of the MCP + A2A server so I think metrics should either be scraped from the pod directly, or there should be a different service

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Moved the metrics port to a dedicated ClusterIP service (controller-metrics-service.yaml) so it stays internal regardless of the main service type. Updated the ServiceMonitor selector to target the new metrics service.

Per review feedback, metrics should not be exposed on the main
controller service which may be a LoadBalancer. This creates a
separate ClusterIP service for metrics scraping.

- Remove metrics port from controller-service.yaml
- Add controller-metrics-service.yaml (ClusterIP, metrics only)
- Update ServiceMonitor selector to target the metrics service

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Helm Chart: Add support for enabling metrics endpoint and optional ServiceMonitor

3 participants