HYPERFLEET-762: Add OpenTelemetry configuration to standard environme…#89
HYPERFLEET-762: Add OpenTelemetry configuration to standard environme…#89ldornele wants to merge 9 commits intoopenshift-hyperfleet:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @ldornele. Thanks for your PR. I'm waiting for a openshift-hyperfleet member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
WalkthroughThe PR removes the OpenTelemetry sampling_rate configuration surface (code, CLI flag, Helm value, and tests); changes default OTel enabled to true in Helm/config; introduces TRACING_ENABLED environment variable which, when set and valid, overrides configured tracing enablement (with deprecated HYPERFLEET_LOGGING_OTEL_ENABLED detection); updates startup to resolve tracingEnabled from env/config and pass it into API server routing; reworks telemetry initialization to use OTEL_* env vars for service name, exporter (OTLP HTTP/gRPC or stdout), and sampler selection; adds telemetry unit tests and logging field constants; and updates docs, Helm templates, flags, loader bindings, and go.mod deps. Sequence Diagram(s)sequenceDiagram
participant Startup as Server Startup
participant Env as Environment (TRACING_ENABLED, OTEL_*)
participant Config as Application Config
participant Telemetry as telemetry.InitTraceProvider
participant Exporter as OTLP/Stdout Exporter
Startup->>Env: Read TRACING_ENABLED, OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_*, OTEL_TRACES_*
Startup->>Config: Load config.logging.otel.enabled
Startup->>Startup: Resolve tracingEnabled (TRACING_ENABLED overrides config if set)
Startup->>Config: Optionally set resolved tracingEnabled into runtime config
Startup->>Telemetry: InitTraceProvider(ctx, serviceName, serviceVersion)
Telemetry->>Env: Read OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_PROTOCOL, OTEL_TRACES_SAMPLER, OTEL_TRACES_SAMPLER_ARG, OTEL_RESOURCE_ATTRIBUTES
Telemetry->>Exporter: Create OTLP exporter (HTTP or gRPC) if endpoint set, else stdouttrace exporter
Telemetry->>Telemetry: Parse and configure sampler from OTEL_TRACES_SAMPLER / ARG
Telemetry->>Telemetry: Build resource (including FromEnv) and set TextMapPropagator (TraceContext + Baggage)
Telemetry->>Startup: Return TracerProvider (or error)
sequenceDiagram
participant Server as API Server
participant Router as Routes Setup
participant Middleware as OTel Middleware
Server->>Router: NewAPIServer(tracingEnabled)
Router->>Router: routes(tracingEnabled)
alt tracingEnabled == true
Router->>Middleware: Register OpenTelemetry middleware
else tracingEnabled == false
Router->>Router: Skip OTel middleware
end
Router->>Server: Return configured router
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
pkg/logger/fields.go (1)
45-45: Inconsistent field naming convention.
FieldTracingEnableduses uppercase"TRACING_ENABLED"while all other field constants in this file usesnake_case(e.g.,FieldOTelEnabled = "otel_enabled",FieldSamplingRate = "sampling_rate"). This inconsistency could cause confusion in log queries.♻️ Suggested fix for consistency
- FieldTracingEnabled = "TRACING_ENABLED" + FieldTracingEnabled = "tracing_enabled"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/logger/fields.go` at line 45, FieldTracingEnabled uses uppercase "TRACING_ENABLED" which is inconsistent with the snake_case convention used by other constants (e.g., FieldOTelEnabled, FieldSamplingRate); rename or change the constant value to snake_case—replace the value of FieldTracingEnabled with "tracing_enabled" (or rename the identifier to match project convention) so it matches the pattern used by other fields in pkg/logger/fields.go.pkg/telemetry/otel_test.go (2)
140-145: Minor: Error message omits the actual error.The shutdown error message doesn't include the error value, making debugging harder.
♻️ Suggested fix
defer func(ctx context.Context, tp *trace.TracerProvider) { err := Shutdown(ctx, tp) if err != nil { - t.Errorf("Failed to shutdown trace provider") + t.Errorf("Failed to shutdown trace provider: %v", err) } }(ctx, tp)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/telemetry/otel_test.go` around lines 140 - 145, The defer's error path in the anonymous cleanup function calls t.Errorf("Failed to shutdown trace provider") without including the returned error; update the t.Errorf call inside the deferred function that wraps Shutdown(ctx, tp) to include the err value (e.g., t.Errorf("Failed to shutdown trace provider: %v", err)) so the Shutdown error from trace.TracerProvider is logged for debugging.
151-160: Incomplete sampling assertion forexpectedSample=truecases.When
expectedSampleis true, the test only verifiesspan.SpanContext().IsValid(). This doesn't confirm that the span was actually sampled—a span context can be valid but not sampled. Consider also checkingIsSampled()for consistency with theexpectedSample=falsebranch.♻️ Suggested fix
if tt.expectedSample { - if !span.SpanContext().IsValid() { - t.Error("Expected valid span context for sampling=true") + if !span.SpanContext().IsValid() || !span.SpanContext().TraceFlags().IsSampled() { + t.Error("Expected valid and sampled span context for sampling=true") } } else {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/telemetry/otel_test.go` around lines 151 - 160, The test's sampling assertion is incomplete: when expectedSample is true you only check span.SpanContext().IsValid(), which doesn't ensure the span was actually sampled; update the true-branch to also assert span.SpanContext().TraceFlags().IsSampled() (mirroring the false-branch check), so when expectedSample is true you fail the test if TraceFlags().IsSampled() is false while keeping the existing IsValid() check for span.SpanContext().docs/logging.md (1)
377-399: Code snippet differs from actual implementation.The initialization snippet at lines 377-399 simplifies error handling compared to the actual implementation in
cmd/hyperfleet-api/servecmd/cmd.go. Notably:
- Line 380 ignores the
strconv.ParseBoolerror, whereas the actual code logs a warning and falls back to config.- Line 395 uses
defer tp.Shutdown(...)directly, whereas the actual code usestelemetry.Shutdown()with proper timeout context.This is acceptable for a conceptual illustration, but consider adding a note that this is a simplified example.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/logging.md` around lines 377 - 399, The snippet in the docs simplifies error handling compared to the real implementation: update the text to note it’s a simplified example and call out the behavioral differences — specifically that strconv.ParseBool errors are logged via logger.WithError and fall back to environments.Environment().Config.Logging.OTel.Enabled instead of being ignored, and that the real code calls telemetry.Shutdown(ctx, tp) with a timeout context rather than deferring tp.Shutdown(...) directly after telemetry.InitTraceProvider; mention the symbols strconv.ParseBool, logger.WithError, telemetry.InitTraceProvider, tp.Shutdown and telemetry.Shutdown so readers can map the doc snippet to the actual implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/config.md`:
- Line 691: Update the checklist entry that currently lists
HYPERFLEET_CLUSTER_ADAPTERS and HYPERFLEET_NODEPOOL_ADAPTERS to use the
canonical environment variable names HYPERFLEET_ADAPTERS_REQUIRED_CLUSTER and
HYPERFLEET_ADAPTERS_REQUIRED_NODEPOOL so it matches the Complete Reference (the
entries referenced at lines 466-467); locate the checklist line in
docs/config.md and replace the two variable names accordingly to ensure
documentation consistency.
---
Nitpick comments:
In `@docs/logging.md`:
- Around line 377-399: The snippet in the docs simplifies error handling
compared to the real implementation: update the text to note it’s a simplified
example and call out the behavioral differences — specifically that
strconv.ParseBool errors are logged via logger.WithError and fall back to
environments.Environment().Config.Logging.OTel.Enabled instead of being ignored,
and that the real code calls telemetry.Shutdown(ctx, tp) with a timeout context
rather than deferring tp.Shutdown(...) directly after
telemetry.InitTraceProvider; mention the symbols strconv.ParseBool,
logger.WithError, telemetry.InitTraceProvider, tp.Shutdown and
telemetry.Shutdown so readers can map the doc snippet to the actual
implementation.
In `@pkg/logger/fields.go`:
- Line 45: FieldTracingEnabled uses uppercase "TRACING_ENABLED" which is
inconsistent with the snake_case convention used by other constants (e.g.,
FieldOTelEnabled, FieldSamplingRate); rename or change the constant value to
snake_case—replace the value of FieldTracingEnabled with "tracing_enabled" (or
rename the identifier to match project convention) so it matches the pattern
used by other fields in pkg/logger/fields.go.
In `@pkg/telemetry/otel_test.go`:
- Around line 140-145: The defer's error path in the anonymous cleanup function
calls t.Errorf("Failed to shutdown trace provider") without including the
returned error; update the t.Errorf call inside the deferred function that wraps
Shutdown(ctx, tp) to include the err value (e.g., t.Errorf("Failed to shutdown
trace provider: %v", err)) so the Shutdown error from trace.TracerProvider is
logged for debugging.
- Around line 151-160: The test's sampling assertion is incomplete: when
expectedSample is true you only check span.SpanContext().IsValid(), which
doesn't ensure the span was actually sampled; update the true-branch to also
assert span.SpanContext().TraceFlags().IsSampled() (mirroring the false-branch
check), so when expectedSample is true you fail the test if
TraceFlags().IsSampled() is false while keeping the existing IsValid() check for
span.SpanContext().
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9550d68f-106e-4b28-81a8-9e6a21fdcecc
📒 Files selected for processing (12)
charts/values.yamlcmd/hyperfleet-api/servecmd/cmd.godocs/config.mddocs/logging.mdpkg/config/flags.gopkg/config/loader.gopkg/config/loader_test.gopkg/config/logging.gopkg/config/logging_test.gopkg/logger/fields.gopkg/telemetry/otel.gopkg/telemetry/otel_test.go
💤 Files with no reviewable changes (3)
- pkg/config/loader.go
- pkg/config/logging_test.go
- pkg/config/loader_test.go
|
/ok-to-test |
|
@ldornele: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
…lugin from Claude Code
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/telemetry/otel_test.go (1)
38-73: Consider addingt.Parallel()isolation or usingt.Setenv.The OTLP exporter test manipulates global environment variables. While tests in this file run sequentially by default, using
t.Setenv(Go 1.17+) would automatically handle cleanup and prevent interference if parallelism is enabled later.// Instead of: os.Setenv("OTEL_EXPORTER_OTLP_ENDPOINT", "...") defer os.Unsetenv("OTEL_EXPORTER_OTLP_ENDPOINT") // Consider: t.Setenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://fake-otel-collector:4317")This is a minor improvement for maintainability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/telemetry/otel_test.go` around lines 38 - 73, The test TestInitTraceProvider_OTLPExporter mutates global env vars with os.Setenv/os.Unsetenv; replace that pattern with t.Setenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://fake-otel-collector:4317") to automatically handle cleanup and avoid cross-test interference, and optionally add t.Parallel() at the top of TestInitTraceProvider_OTLPExporter if you want the test to run concurrently; ensure references to InitTraceProvider and Shutdown remain unchanged and that the tracer assertions still follow the t.Setenv call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/telemetry/otel_test.go`:
- Around line 38-73: The test TestInitTraceProvider_OTLPExporter mutates global
env vars with os.Setenv/os.Unsetenv; replace that pattern with
t.Setenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://fake-otel-collector:4317") to
automatically handle cleanup and avoid cross-test interference, and optionally
add t.Parallel() at the top of TestInitTraceProvider_OTLPExporter if you want
the test to run concurrently; ensure references to InitTraceProvider and
Shutdown remain unchanged and that the tracer assertions still follow the
t.Setenv call.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d7e5e533-cfbb-43f3-8928-71044a158f42
📒 Files selected for processing (5)
cmd/hyperfleet-api/servecmd/cmd.godocs/config.mddocs/logging.mdpkg/logger/fields.gopkg/telemetry/otel_test.go
✅ Files skipped from review due to trivial changes (2)
- pkg/logger/fields.go
- docs/logging.md
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/telemetry/otel_test.go`:
- Around line 52-62: The test currently calls Shutdown(ctx, tp) only after
assertions, which can leak the trace provider if an assertion fails; immediately
defer the cleanup right after creating the trace provider (tp) so Shutdown(ctx,
tp) is always executed—i.e., place a defer func() { _ = Shutdown(ctx, tp) }()
(or check/handle error inside defer) immediately after tp is
initialized/returned and before calling otel.Tracer("test") so the tracer, tp,
Shutdown, and ctx references are preserved and cleanup always runs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b3955823-9df9-4051-9ce6-ee7b96f04b8f
📒 Files selected for processing (1)
pkg/telemetry/otel_test.go
|
/ok-to-test |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@go.mod`:
- Around line 35-41: Update the vulnerable dependency
go.opentelemetry.io/otel/sdk from v1.38.0 to at least v1.40.0 in go.mod to
mitigate CVE-2026-24051; specifically change the module version string for
go.opentelemetry.io/otel/sdk to v1.40.0 (or later) and run `go mod tidy`/`go
get` to refresh the lockfile and ensure the build picks up the patched release
that uses the absolute /usr/sbin/ioreg path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 105558a0-d208-456d-90df-dd5e89435b26
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (3)
charts/templates/configmap.yamlcharts/values.yamlgo.mod
💤 Files with no reviewable changes (1)
- charts/templates/configmap.yaml
✅ Files skipped from review due to trivial changes (1)
- charts/values.yaml
| // Determine if tracing is enabled | ||
| // Precedence: TRACING_ENABLED (tracing standard) > config (env/flags) > default | ||
| var tracingEnabled bool | ||
| if tracingEnv := os.Getenv("TRACING_ENABLED"); tracingEnv != "" { |
There was a problem hiding this comment.
Can we align on using a single variable for enabling tracing?
I see the standard says TRACING_ENABLED and this may come from before standarising environment variables with prefix HYPERFLEET_
Then, the docs mention HYPERFLEET_LOGGING_OTEL_ENABLED
I think we should consolidate in a single one and keep it consistent across repositories.
AFAIK OTEL_* variables do not have the HYPERFLEET_ prefix since they are used directly by the OTEL libraries, but any other variable that we introduce should be prefixed by our standard convention.
There was a problem hiding this comment.
Got it! I’ll remove HYPERFLEET_LOGGING_OTEL_ENABLED then. I’ll keep the OTEL_ variables* for consistency with hyperfleet-sentinel. Does that make sense?
There was a problem hiding this comment.
Also to clarify, I meant to use HYPERFLEET_TRACING_ENABLED instead of simple TRACING_ENABLED
There was a problem hiding this comment.
HYPERFLEET_TRACING_ENABLED is not compliant with the tracing standards — we should use TRACING_ENABLED instead. See: tracing standards
There was a problem hiding this comment.
I am afraid the standards need to be updated to env with HYPERFLEET_ prefix. It comes from configuration.md
There was a problem hiding this comment.
I got your point. In that case, should we update the tracing standard first so it aligns with the configuration standard as well?
There was a problem hiding this comment.
I opened a follow-up ticket for this: https://redhat.atlassian.net/browse/HYPERFLEET-816
| serviceName := "hyperfleet-api" | ||
| if svcName := os.Getenv("OTEL_SERVICE_NAME"); svcName != "" { | ||
| serviceName = svcName | ||
| } | ||
|
|
||
| traceProvider, err := telemetry.InitTraceProvider(ctx, serviceName, api.Version) |
There was a problem hiding this comment.
Both sentinel and adapter have a name field
I wonder if we should add it to the API to avoid having this constant "hyperfleet-api" here
Then, here we are reading a variable that is otel specific.... I wonder if that should be done inside the telemetry.InitTraceProvider function, so main can become completely unaware of this setting
Updates OpenTelemetry SDK and exporters from v1.38.0 to v1.40.0, addressing
potential security vulnerabilities and fixing CI test-integration failures
caused by missing dependency entries in go.sum.
Key updates:
- go.opentelemetry.io/otel: v1.38.0 → v1.40.0
- go.opentelemetry.io/otel/sdk: v1.38.0 → v1.40.0
- go.opentelemetry.io/otel/exporters/otlp/*: v1.38.0 → v1.40.0
- go.opentelemetry.io/otel/exporters/stdout/stdouttrace: v1.38.0 → v1.40.0
- google.golang.org/grpc: v1.75.0 → v1.78.0
Includes transitive dependency updates for protobuf, grpc-gateway, and
golang.org/x/{crypto,net,sys,text}.
All 551 unit tests pass.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
xueli181114
left a comment
There was a problem hiding this comment.
Developer review: inline follow-ups on consistency and operator-facing correctness.
xueli181114
left a comment
There was a problem hiding this comment.
Good direction — moving to standard OTEL env vars is the right call. A few things I noticed beyond what rh-amarin and rafabene already flagged.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@go.mod`:
- Line 151: Update the transitive gRPC dependency to a patched release by
bumping google.golang.org/grpc from v1.78.0 to a secure version (e.g., the
latest patched 1.8x+ release) in go.mod and run go mod tidy to lock it; ensure
the change resolves the CVE affecting :path header validation so the
OpenTelemetry gRPC exporter (otlptracegrpc) usages in the trace
export/initialization code are using the upgraded grpc library at runtime.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: feaf5d3b-45b3-4ade-b100-735d5a8bd5ea
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (15)
charts/templates/configmap.yamlcharts/values.yamlcmd/hyperfleet-api/servecmd/cmd.gocmd/hyperfleet-api/server/api_server.gocmd/hyperfleet-api/server/routes.godocs/config.mddocs/logging.mdgo.modpkg/config/flags.gopkg/config/loader.gopkg/config/logging.gopkg/config/logging_test.gopkg/telemetry/otel.gopkg/telemetry/otel_test.gotest/helper.go
💤 Files with no reviewable changes (3)
- charts/templates/configmap.yaml
- pkg/config/loader.go
- pkg/config/flags.go
✅ Files skipped from review due to trivial changes (1)
- pkg/telemetry/otel_test.go
🚧 Files skipped from review as they are similar to previous changes (6)
- charts/values.yaml
- pkg/config/logging_test.go
- docs/logging.md
- pkg/config/logging.go
- docs/config.md
- pkg/telemetry/otel.go
| golang.org/x/text v0.33.0 // indirect | ||
| google.golang.org/genproto/googleapis/api v0.0.0-20260128011058-8636f8732409 // indirect | ||
| google.golang.org/genproto/googleapis/rpc v0.0.0-20260128011058-8636f8732409 // indirect | ||
| google.golang.org/grpc v1.78.0 // indirect |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Verify grpc vulnerability and check latest available version
echo "=== Current grpc version in go.mod ==="
rg -n 'google\.golang\.org/grpc' go.mod
echo ""
echo "=== Check for grpc usage in codebase ==="
rg -n 'google\.golang\.org/grpc' --type go -g '!go.mod' -g '!go.sum' | head -20
echo ""
echo "=== Check otlptracegrpc usage (creates gRPC connections) ==="
rg -n 'otlptracegrpc' --type go
echo ""
echo "=== Query OSV for grpc v1.78.0 advisories ==="
curl -s -X POST https://api.osv.dev/v1/query \
-H "Content-Type: application/json" \
-d '{"package": {"ecosystem": "Go", "name": "google.golang.org/grpc"}, "version": "1.78.0"}' | \
python3 -c "import sys,json; d=json.load(sys.stdin); vulns=d.get('vulns',[]); print(f'Found {len(vulns)} vulnerability(ies)'); [print(f\" - {v.get('id')}: {v.get('summary','N/A')}\") for v in vulns]"Repository: openshift-hyperfleet/hyperfleet-api
Length of output: 776
Upgrade google.golang.org/grpc to address a critical authorization bypass vulnerability.
The current version v1.78.0 has a critical authorization bypass (CVE identifiers: GO-2026-4762, GHSA-p77j-4mvh-x3m3) involving missing leading slash validation in :path headers. This affects gRPC services with authorization interceptors.
The codebase uses gRPC through otlptracegrpc for OpenTelemetry trace export (see pkg/telemetry/otel.go lines 55, 59), making this vulnerability applicable. Upgrade to a patched version immediately.
🧰 Tools
🪛 OSV Scanner (2.3.5)
[CRITICAL] 151-151: google.golang.org/grpc 1.78.0: Authorization bypass in gRPC-Go via missing leading slash in :path in google.golang.org/grpc
(GO-2026-4762)
[CRITICAL] 151-151: google.golang.org/grpc 1.78.0: gRPC-Go has an authorization bypass via missing leading slash in :path
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@go.mod` at line 151, Update the transitive gRPC dependency to a patched
release by bumping google.golang.org/grpc from v1.78.0 to a secure version
(e.g., the latest patched 1.8x+ release) in go.mod and run go mod tidy to lock
it; ensure the change resolves the CVE affecting :path header validation so the
OpenTelemetry gRPC exporter (otlptracegrpc) usages in the trace
export/initialization code are using the upgraded grpc library at runtime.
| logger.With(ctx, | ||
| "deprecated_variable", "HYPERFLEET_LOGGING_OTEL_ENABLED", | ||
| "replacement", "TRACING_ENABLED", | ||
| ).Warn("HYPERFLEET_LOGGING_OTEL_ENABLED is deprecated and ignored. Please use TRACING_ENABLED instead.") |
There was a problem hiding this comment.
The PR description says HYPERFLEET_LOGGING_OTEL_ENABLED still works via Viper, but the Viper
bindings for logging.otel.enabled were removed from loader.go (both bindAllEnvVars and
bindFlags). This means the env var is truly dead — it won't flow into the config struct.
The deprecation warning correctly says "ignored", but existing deployments that used
HYPERFLEET_LOGGING_OTEL_ENABLED=false to disable tracing will break silently, since the
default flipped to true.
Please, Update the PR description to clearly state HYPERFLEET_LOGGING_OTEL_ENABLED is no
longer supported.
|
The JIRA ticket HYPERFLEET-762 lists "CloudEvent Tracing Integration" as one of the missing Is this intentionally deferred to a follow-up ticket? If so, consider noting it in the PR |
|
|
||
| | Property | Environment Variable | Type | Default | Description | | ||
| |----------|---------------------|------|---------|-------------| | ||
| | `logging.otel.enabled` | `TRACING_ENABLED` | bool | `true` | Enable OpenTelemetry tracing (HyperFleet standard) | |
There was a problem hiding this comment.
We need to update to use the HYPERFLEET_ prefix here.
| } | ||
| } | ||
|
|
||
| func TestInitTraceProvider_OTLPExporter(t *testing.T) { |
There was a problem hiding this comment.
The http/protobuf exporter path (line 53 of otel.go) has no test coverage.
TestInitTraceProvider_OTLPExporter only tests the default gRPC case, and
TestInitTraceProvider_InvalidProtocol tests invalid values falling back to gRPC.
Consider adding a test case for http/protobuf:
func TestInitTraceProvider_OTLPHttpExporter(t *testing.T) {
ctx := context.Background()
t.Setenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://fake-otel-collector:4318")
t.Setenv("OTEL_EXPORTER_OTLP_PROTOCOL", "http/protobuf")
tp, err := InitTraceProvider(ctx, "test-service", "v1.0.0")
if err != nil {
t.Fatalf("Failed to initialize trace provider with HTTP OTLP: %v", err)
}
defer func() {
if err := Shutdown(ctx, tp); err != nil {
t.Errorf("Failed to shutdown: %v", err)
}
}()
}| // - OTEL_TRACES_SAMPLER: sampler type (default: "parentbased_traceidratio") | ||
| // - OTEL_TRACES_SAMPLER_ARG: sampling rate 0.0-1.0 (default: 1.0) | ||
| // - OTEL_RESOURCE_ATTRIBUTES: additional resource attributes (k=v,k2=v2 format) | ||
| func InitTraceProvider(ctx context.Context, serviceName, serviceVersion string) (*trace.TracerProvider, error) { |
There was a problem hiding this comment.
InitTraceProvider is now ~80 lines with three distinct concerns: exporter creation, resource
setup, and sampler selection. Consider extracting helpers to reduce complexity:
func createExporter(ctx context.Context) (trace.SpanExporter, error) {
// OTLP vs stdout logic
}
func selectSampler(ctx context.Context) trace.Sampler {
// sampler type switch + parseSamplingRate
}This would make InitTraceProvider a concise orchestrator (~30 lines) and make each piece
independently testable.
| } | ||
| } | ||
|
|
||
| func TestInitTraceProvider_SamplerEnvironmentVariables(t *testing.T) { |
There was a problem hiding this comment.
Each test calls InitTraceProvider which sets otel.SetTracerProvider(tp) and
otel.SetTextMapPropagator(...) globally. The global state is never reset between subtests,
which could cause subtle issues if tests are reordered or parallelized.
Consider adding cleanup in each subtest:
defer func() {
if err := Shutdown(ctx, tp); err != nil {
t.Errorf("Failed to shutdown: %v", err)
}
otel.SetTracerProvider(nil)
otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator())
}()Or wrap the reset in a test helper that all subtests share.
Motivation
TRACING_ENABLED(Tracing standard) for unified control across hyperfleet-api, hyperfleet-sentinel, and adaptersChanges
Variable Migration
HYPERFLEET_LOGGING_OTEL_SAMPLING_RATEOTEL_TRACES_SAMPLER_ARGOTEL_SERVICE_NAMEOTEL_EXPORTER_OTLP_ENDPOINTOTEL_EXPORTER_OTLP_PROTOCOLOTEL_TRACES_SAMPLEROTEL_RESOURCE_ATTRIBUTESBackward Compatibility:
HYPERFLEET_LOGGING_OTEL_ENABLEDstill works via Viperlogging.otel.enabled: truecontinue to workTRACING_ENABLEDtakes precedence when setVariable Precedence
TRACING_ENABLED (Tracing standard)
↓
HYPERFLEET_LOGGING_OTEL_ENABLED (Viper env var)
↓
config.yaml: logging.otel.enabled
↓
Default (false)
Documentation
Summary by CodeRabbit
New Features
Changes
Documentation
Tests