Skip to content

[Filebeat] http_endpoint input should not report DEGRADED on client validation failures #49577

@vatsaldesai93

Description

@vatsaldesai93

Description

The http_endpoint input (health reporting added in PR #44310 per #44281) currently calls status.Degraded when a client request fails validation (e.g., wrong HTTP method, failed auth). The same status.Degraded mechanism is used for these client validation failures and component level failures.

Also, Fleet uses worst-case aggregation across components, a single rejected request can mark the entire agent as "Unhealthy" in Fleet UI.

The input stays DEGRADED until the next valid request arrives. There is no background health check or periodic reset. This is particularly troublesome for sporadic webhook sources that may go hours/days between events, a single stray GET from a health check probe or network scanner leaves the agent showing Unhealthy the entire time.

Would we be able to improve this behavior such that client validation failures do not affect the input's health status. Rejecting an invalid request is expected, healthy behavior, and not a sign of degradation. Logging the rejection rather than changing health status might be a better fit.

Background

The original spec in #44281 only considered infrastructure level conditions like sustained back pressure as potential DEGRADED triggers. Client validation failures were not discussed. The implementation in PR #44310 went beyond this by marking all validation failures as DEGRADED.

Current behavior

  1. Deploy an Elastic Agent with an http_endpoint integration (default config accepts POST only)
  2. Send one valid POST. Agent shows Healthy in Fleet
  3. Send a GET request: curl http://host:port/path
  4. Agent transitions to DEGRADED (HTTP 405)
  5. Agent stays DEGRADED until the next valid POST arrives

From a real agent diagnostic bundle, this cycle is visible in the event logs:

07:25:54Z  Unit state changed (HEALTHY->DEGRADED): request did not validate: only POST requests are allowed
07:29:04Z  Unit state changed (DEGRADED->HEALTHY): Healthy    ← triggered by a valid POST ~4 mins later

Affected validation paths

All of these trigger status.Degraded in handler.go:

  • Wrong HTTP method (405)
  • Failed basic auth (401)
  • Invalid secret header (401)
  • Wrong content-type (415)
  • Missing, malformed, or mismatched HMAC signature (401)
  • Malformed JSON body
  • CRC validation failure
  • Invalid query parameters

Thank you for considering this. Happy to provide additional detail, diagnostics, or help test a fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs_teamIndicates that the issue/PR needs a Team:* label

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions