feat(helm): add metrics endpoint and ServiceMonitor support#1435
feat(helm): add metrics endpoint and ServiceMonitor support#1435optimus-fulcria wants to merge 5 commits intokagent-dev:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds first-class Helm configuration for exposing the controller’s Prometheus metrics endpoint and optionally deploying a Prometheus Operator ServiceMonitor to scrape it (closes #1369).
Changes:
- Introduces
controller.metricsvalues (enablement, port, secure mode, andserviceMonitorconfig). - Updates controller Deployment/Service/ConfigMap templates to expose the metrics port and set
METRICS_*env vars when enabled. - Adds a new
controller-servicemonitor.yamltemplate plus Helm unit tests for the new behavior.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| helm/kagent/values.yaml | Adds controller.metrics and controller.metrics.serviceMonitor values to configure metrics + ServiceMonitor. |
| helm/kagent/templates/controller-configmap.yaml | Conditionally sets METRICS_BIND_ADDRESS / METRICS_SECURE when metrics are enabled. |
| helm/kagent/templates/controller-deployment.yaml | Conditionally exposes a metrics container port. |
| helm/kagent/templates/controller-service.yaml | Conditionally exposes a metrics Service port. |
| helm/kagent/templates/controller-servicemonitor.yaml | Adds optional ServiceMonitor resource for Prometheus Operator scraping. |
| helm/kagent/tests/controller-deployment_test.yaml | Adds unit tests for deployment port + configmap env vars behavior. |
| helm/kagent/tests/controller-service_test.yaml | Adds unit tests for conditional Service metrics port exposure. |
| helm/kagent/tests/controller-servicemonitor_test.yaml | Adds unit tests for ServiceMonitor rendering and configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,33 @@ | |||
| {{- if and .Values.controller.metrics.enabled .Values.controller.metrics.serviceMonitor.enabled }} | |||
There was a problem hiding this comment.
ServiceMonitor rendering is only gated by values. This chart already guards optional CRD-based resources with .Capabilities.APIVersions.Has (e.g., openshift-route.yaml) to avoid install/upgrade failures when the CRD is not installed. Consider also gating this template on .Capabilities.APIVersions.Has "monitoring.coreos.com/v1" so controller.metrics.serviceMonitor.enabled=true doesn’t hard-fail on clusters without Prometheus Operator CRDs.
| {{- if and .Values.controller.metrics.enabled .Values.controller.metrics.serviceMonitor.enabled }} | |
| {{- if and .Values.controller.metrics.enabled .Values.controller.metrics.serviceMonitor.enabled (.Capabilities.APIVersions.Has "monitoring.coreos.com/v1") }} |
| insecureSkipVerify: true | ||
| {{- end }} |
There was a problem hiding this comment.
When controller.metrics.secure is true, the controller enables controller-runtime’s authn/authz filter on the metrics endpoint. This ServiceMonitor config sets scheme: https but does not configure any bearer token/auth settings, so Prometheus scrapes are likely to be rejected. Add an endpoint auth configuration (e.g., bearerTokenFile) when secure metrics are enabled (and/or allow configuring the full endpoint auth block via values).
| insecureSkipVerify: true | |
| {{- end }} | |
| insecureSkipVerify: true | |
| {{- with .Values.controller.metrics.serviceMonitor.auth }} | |
| {{- toYaml . | nindent 6 }} | |
| {{- else }} | |
| bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token | |
| {{- end }} | |
| {{- end }} |
…rviceMonitor Closes kagent-dev#1369 Add first-class Helm values for metrics configuration: - controller.metrics.enabled: enables the metrics endpoint - controller.metrics.port: configurable metrics port (default 9093) - controller.metrics.secure: HTTPS vs HTTP toggle - controller.metrics.serviceMonitor.*: Prometheus Operator ServiceMonitor When enabled, the chart: - Sets METRICS_BIND_ADDRESS and METRICS_SECURE env vars in the configmap - Adds a metrics port to the controller deployment and service - Optionally creates a ServiceMonitor with configurable labels, interval, scrapeTimeout, and relabeling configs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Optimus (AI Agent) <agent@fulcria.com>
…metrics Address review feedback: - Gate ServiceMonitor rendering on .Capabilities.APIVersions.Has to avoid failures on clusters without Prometheus Operator CRDs - Add bearerTokenFile for service account auth when secure metrics enabled - Add test for CRD unavailability scenario - Add test for bearerTokenFile in secure mode Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Optimus (AI Agent) <agent@fulcria.com>
71c2bf6 to
686d15f
Compare
| {{- if .Values.controller.metrics.enabled }} | ||
| - port: {{ .Values.controller.metrics.port }} | ||
| targetPort: {{ .Values.controller.metrics.port }} | ||
| protocol: TCP | ||
| name: metrics | ||
| {{- end }} |
There was a problem hiding this comment.
Do we need to expose these on the service? The controller service can potentially be a LoadBalancer because of the MCP + A2A server so I think metrics should either be scraped from the pod directly, or there should be a different service
There was a problem hiding this comment.
Good point. Moved the metrics port to a dedicated ClusterIP service (controller-metrics-service.yaml) so it stays internal regardless of the main service type. Updated the ServiceMonitor selector to target the new metrics service.
Per review feedback, metrics should not be exposed on the main controller service which may be a LoadBalancer. This creates a separate ClusterIP service for metrics scraping. - Remove metrics port from controller-service.yaml - Add controller-metrics-service.yaml (ClusterIP, metrics only) - Update ServiceMonitor selector to target the metrics service Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Closes #1369
Adds first-class Helm values for enabling the controller metrics endpoint and optionally deploying a Prometheus Operator
ServiceMonitor.Changes
values.yaml: Newcontroller.metricssection withenabled,port,secure, andserviceMonitorsub-valuescontroller-configmap.yaml: SetsMETRICS_BIND_ADDRESSandMETRICS_SECUREenv vars when metrics are enabledcontroller-deployment.yaml: Adds metrics container port when enabledcontroller-service.yaml: Adds metrics service port when enabledcontroller-servicemonitor.yaml(new): Optional ServiceMonitor with configurable labels, interval, scrapeTimeout, and relabelingUsage
Test plan
🤖 Generated with Claude Code