Skip to content

Fix journalctl process lifecycle and cleanup bugs#49528

Draft
belimawr wants to merge 42 commits intoelastic:mainfrom
belimawr:fix-journalctl-life-cycle
Draft

Fix journalctl process lifecycle and cleanup bugs#49528
belimawr wants to merge 42 commits intoelastic:mainfrom
belimawr:fix-journalctl-life-cycle

Conversation

@belimawr
Copy link
Contributor

@belimawr belimawr commented Mar 17, 2026

Proposed commit message

This PR hardens process management in the journalctl wrapper by fixing several lifecycle
edge cases: factory failures now return a true nil interface (avoiding panics on unusable
instances), Kill() now treats already-finished processes as non-errors and waits for
full shutdown/reaping before returning, and the stdout reader removes unreachable "file 
already closed" special-casing now that goroutine/wait ordering is correct. Together,
these changes make shutdown deterministic and prevent misleading errors and goroutine
leaks.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool.

## Disruptive User Impact
## Author's Checklist

How to test this PR locally

Run the Journald input tests:

cd filebeat
go test ./input/journald/... -v
mage BuildSystemTestBinary
go test -count=1 -v -tags=integration ./tests/integration -run=TestJournald

Related issues

## Use cases
## Screenshots
## Logs

philippkahr and others added 30 commits March 5, 2026 16:59
 - Set timezone to match host
 - Add systemd 242
 - use absolute path for journald.conf
Kill() previously only sent SIGKILL and returned immediately, leaving
the stdout/stderr reader goroutines and the process-wait goroutine
still running. This meant Close() could return before cleanup finished,
potentially leaking goroutines.
Kill() now calls waitDone.Wait() after sending SIGKILL, blocking until
all background goroutines have exited and the process has been reaped.

GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Cursor-CLI, Model: Claude 4.6 Opus (Thinking)
belimawr added 10 commits March 13, 2026 12:16
When journalctl exits on its own (crash or short-lived commands like
--version), Kill() returned "process already finished" which Close()
propagated as a misleading error. Swallow os.ErrProcessDone since the
process is already gone and cleanup via waitDone.Wait() still runs.
The readersWG fix ensures cmd.Wait() is only called after the reader
goroutines finish, so the pipe can no longer be closed while readers
are active. The fs.PathError branch handling "file already closed" is
now unreachable. Replace it with a simple non-EOF error log, matching
the stderr reader.

GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Cursor-CLI, Model: Claude 4.6 Opus (Thinking)
The factory error paths returned &journalctl{}, a non-nil pointer to
an unusable struct. Because the return type is the Jctl interface, this
value appears non-nil to callers, defeating nil checks and panicking
on any method call. Return nil so the interface value is a true nil.

GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Cursor-CLI, Model: Claude 4.6 Opus (Thinking)
@belimawr belimawr self-assigned this Mar 17, 2026
@belimawr belimawr added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Mar 17, 2026
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Mar 17, 2026
@github-actions
Copy link
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Contributor

mergify bot commented Mar 17, 2026

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @belimawr? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

When Kill was called, if the stdout reader goroutine was trying to
send on the channel and nobody was listening, the goroutine would get
stuck and Kill would block forever.

This commit fixes it by using a channel to notify this goroutine to
return when Kill is called.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants