Conversation
🤖 GitHub commentsJust comment with:
|
|
This pull request is now in conflicts. Could you fix it? 🙏 |
|
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
0adb194 to
41ebd88
Compare
|
This pull request is now in conflicts. Could you fix it? 🙏 |
41ebd88 to
04a16c2
Compare
🔍 Preview links for changed docs |
04a16c2 to
14673f7
Compare
|
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
… GetExporterTypeFromEnv() for the case of metrics off, traces on". The chage in "Tidy up GetExporterTypeFromEnv() for the case of metrics off, traces on" was to require OTEL_METRICS_EXPORTER to be set and not default it to otlp becuase an OTEL_EXPORTER_OTLP_ENDPOINT was set. This is important because otherwise configuration to send traces to an OTLP endpoint would activate the metrics export, which the endpoint may not be able to handle.
Make the tracer provider injectable via an optional field on the input struct so tests can capture spans with a SpanRecorder. Add three tests covering the happy path (single execution), the want_more loop (two executions per period), and an evaluation error, asserting on span names, parent-child relationships, attributes, and status codes.
The trace span tests were waiting for a fixed 5s timeout even when the first periodic run had already completed. This updates the tests to cancel when the root periodic run span (cel.periodic.run) is observed as ended in the span recorder. The timeout remains as a safety guard for regressions, but no longer drives normal test completion. Why this is safe: - assertions in these tests target spans and attributes produced inside that completed periodic run - cancellation now happens after the root run trace is complete, avoiding the earlier cancellation race during publish Result: tests keep the same tracing assertions while running in ~0.01s per test instead of ~5s.
48e1662 to
966d11f
Compare
| if n < 0 { | ||
| return 0 | ||
| } | ||
| return time.Duration(n) * time.Millisecond |
There was a problem hiding this comment.
This looks a bit odd to me. Why are you converting n to time.Duration if n is the number of milliseconds?
You can multiply a number directly by time.Millisecond and you'll have a time.Duration
| return time.Duration(n) * time.Millisecond | |
| return n * time.Millisecond |
There was a problem hiding this comment.
Thanks. Yeah, there no hidden reason.
There was a problem hiding this comment.
Actually, it's required to make the types work, so I'm reverting to leave it as it was.
Something like 55 * time.Millisecond would work because 55 is an untyped constant.
Apparently time.Duration(n) * time.Millisecond is idiomatic and consistent with the library design.
There was a problem hiding this comment.
This is language behaviour; n is not a time.Duration while time.Millisecond is, and Go does not have C-like implicit arithmetic type coercion.
x-pack/filebeat/otel/trace.go
Outdated
| found = true | ||
| b, err := strconv.ParseBool(strings.TrimSpace(raw)) | ||
| if err != nil { | ||
| return false, true, fmt.Errorf("%s must be boolean: %w", k, err) |
There was a problem hiding this comment.
[Question]
Does it make sense to return found = true on an error? The key is present, but the value is invalid... Could you elaborate more on the intended behaviour here?
There was a problem hiding this comment.
I think not. I'll change that.
| } | ||
|
|
||
| func TestNewExporterCfgFromEnv_ExporterDefaultsToNone(t *testing.T) { | ||
| // unset BEATS_OTEL_TRACES_DISABLE |
There was a problem hiding this comment.
I don't get this comment, should it be removed?
| // unset BEATS_OTEL_TRACES_DISABLE |
There was a problem hiding this comment.
The tests above and below set environment variables and make assertions about how the configuration is read from them.
For this test no setup is necessary. The comment is meant to indicate that the test makes assertions about the behavior when this variable is not set.
I don't mind removing it. What do you think?
|
|
||
| func TestNewExporterCfgFromEnv_NotInsecureByDefault(t *testing.T) { | ||
| t.Setenv("OTEL_EXPORTER_OTLP_ENDPOINT", "otlp-receiver.example.com:4317") | ||
| // unset OTEL_EXPORTER_OTLP_INSECURE |
There was a problem hiding this comment.
Again, I'm confused by this comment.
There was a problem hiding this comment.
Same again. Could make it look less magical, like this:
// With OTEL_EXPORTER_OTLP_INSECURE not set.
Co-authored-by: Tiago Queiroz <github@queiroz.life>
Co-authored-by: Tiago Queiroz <github@queiroz.life>
Co-authored-by: Tiago Queiroz <github@queiroz.life>
This reverts commit 116a0fc.
Proposed commit message
There were a couple of things for which the initial approach changed:
ContextInjectorrewrite requests in the client used by CEL, which also solves the problem for OAuth2 requests.There are some differences from the attribute and other names given in the planning document:
cel.periodic.program_count→ Changed to
cel.periodic.execution_countto matchcel.program.execution.cel.program.batch_count→ Removed. It would only indicate whether an execution returned any events or not. Any other batching is internal to the CEL evaluation.
cel.{periodic,program}.success→ Removed, in favor of span status.
cel.program.error_message→ Not set. Uses
SetStatusandRecordErrorinstead.BEATS_OTEL_TRACING_DISABLE→ Changed to
BEATS_OTEL_TRACES_DISABLEto matchOTEL_TRACES_EXPORTERandOTEL_EXPORTER_OTLP_TRACES_*.Handling of span-specific context and loggers is somewhat cumbersome. Refactoring to extract separate functions from
runfor separate stages of processing will help to tidy this up and is planned as follow-up work: #48464.Checklist
stresstest.shscript to run them under stress conditions and race detector to verify their stability../changelog/fragmentsusing the changelog tool.How to test this PR locally
You can use
otel-desktop-vieweras simple receiver and viewer of OTel traces:In the terminal with those environment variables set, you can run the input with an example that includes OAuth2 and multiple requests per period, like this:
You can also use Elastic Observability to receive and view OTel traces, but it involves a bit more setup.
Bring up the Elastic Stack:
In Kibana, go to "Management > Integrations" and go to the "APM" integration page. Click "Manage APM integration in Fleet", then "Add Elastic APM". Under "Configure integration > Integration settings > General > Server configuration", change the Host and URL settings to use '0.0.0.0' instead of 'localhost'. Under "Where to add this integration?", choose "Existing hosts > Elastic Agent (elastic-package)". Then click "Save and continue".
Now, back in the terminal, find the IP address of the agent container.
Use that as the destination for OTel traces:
Then from the terminal with those settings you can run the input using example Filebeat configuration as above.
To view the exported traces in Kibana, go to "Observability > Applications > Traces".
Related
Use cases
This tracing is to be used for troubleshooting, particularly for Agentless.
Screenshots
OTel traces for the CEL Input in Elastic Observability:
