NetRunner

Safe OS process execution for Elixir. Zero zombie processes, NIF-based backpressure, PTY support, and cgroup isolation.

NetRunner combines NIF-based async I/O (enif_select) with a persistent shepherd binary that watches over child processes. Three layers of cleanup guarantee zero zombies under every failure mode — BEAM crash, GenServer crash, or resource leak.

Installation

def deps do
  [
    {:net_runner, "~> 1.0"}
  ]
end

Requires a C compiler (gcc or clang) and make.

Quick Start

# Run a command, collect output
{output, 0} = NetRunner.run(~w(echo hello))
# => {"hello\n", 0}

# Pipe data through stdin
{output, 0} = NetRunner.run(~w(cat), input: "from elixir")
# => {"from elixir", 0}

# Nonzero exit status
{"", 1} = NetRunner.run(~w(false))

# Multi-word commands with arguments
{output, 0} = NetRunner.run(["sh", "-c", "echo $HOME"])

Streaming

Lazy, demand-driven streams with backpressure — data stays in the OS pipe buffer until you consume it, so you can stream gigabytes without OOM:

# Stream stdout chunks with backpressure
NetRunner.stream!(~w(cat /usr/share/dict/words))
|> Stream.filter(&String.starts_with?(&1, "elixir"))
|> Enum.to_list()

# Pipe input through a command
NetRunner.stream!(~w(tr a-z A-Z), input: "hello world")
|> Enum.join()
# => "HELLO WORLD"

# Sort lines
NetRunner.stream!(~w(sort), input: "cherry\napple\nbanana\n")
|> Enum.join()
# => "apple\nbanana\ncherry\n"

# Process large files without loading into memory
File.stream!("huge.csv")
|> NetRunner.stream!(~w(grep ERROR))
|> Stream.each(&IO.write/1)
|> Stream.run()

# Non-raising variant
{:ok, stream} = NetRunner.stream(~w(sort), input: "c\nb\na\n")
Enum.to_list(stream)
# => ["a\nb\nc\n"]

Timeouts and Limits

# Kill process after 500ms
{:error, :timeout} = NetRunner.run(~w(sleep 100), timeout: 500)

# Cap output size — kills process if exceeded
{:error, {:max_output_exceeded, _partial}} =
  NetRunner.run(["sh", "-c", "yes"], max_output_size: 1000)

# Custom kill escalation: SIGTERM → wait 2s → SIGKILL
NetRunner.run(~w(my_server), kill_timeout: 2000, timeout: 10_000)

Process API

For fine-grained control over the OS process lifecycle:

alias NetRunner.Process, as: Proc

# Start a process
{:ok, pid} = Proc.start("cat", [])

# Write to stdin
:ok = Proc.write(pid, "hello world")
:ok = Proc.close_stdin(pid)

# Read from stdout (blocks until data available)
{:ok, "hello world"} = Proc.read(pid)
:eof = Proc.read(pid)

# Wait for exit
{:ok, 0} = Proc.await_exit(pid)

Incremental reads and writes

{:ok, pid} = Proc.start("cat", [])

# Write in chunks — useful for feeding large data
:ok = Proc.write(pid, "chunk 1 ")
:ok = Proc.write(pid, "chunk 2 ")
:ok = Proc.write(pid, "chunk 3")
:ok = Proc.close_stdin(pid)

# Read comes back in whatever chunks the OS delivers
{:ok, data} = Proc.read(pid)
# data => "chunk 1 chunk 2 chunk 3" (may come in multiple reads)

Signals and process groups

{:ok, pid} = Proc.start("sleep", ["100"])
:ok = Proc.kill(pid, :sigterm)
{:ok, 143} = Proc.await_exit(pid)  # 128 + SIGTERM(15)

# Signals kill the entire process group (catches grandchildren)
{:ok, pid} = Proc.start("sh", ["-c", "sleep 100 & sleep 100 & wait"])
:ok = Proc.kill(pid, :sigkill)
# All three processes (sh + both sleeps) are killed

Supported signals: :sigterm, :sigkill, :sigint, :sighup, :sigusr1, :sigusr2, :sigstop, :sigcont, :sigquit, :sigpipe.

Checking process state

{:ok, pid} = Proc.start("sleep", ["10"])

Proc.alive?(pid)   # => true
Proc.os_pid(pid)   # => 12345 (the actual OS PID)

Proc.kill(pid, :sigkill)
Proc.await_exit(pid)

Proc.alive?(pid)   # => false

Stats

Every process tracks I/O statistics automatically:

{:ok, pid} = Proc.start("cat", [])
Proc.write(pid, "hello")
Proc.close_stdin(pid)
Proc.read(pid)
Proc.await_exit(pid)

stats = Proc.stats(pid)
stats.bytes_in     # => 5       (bytes written to stdin)
stats.bytes_out    # => 5       (bytes read from stdout)
stats.read_count   # => 1       (number of read calls)
stats.write_count  # => 1       (number of write calls)
stats.duration_ms  # => 3       (wall-clock time)
stats.exit_status  # => 0       (exit code)

PTY Mode

Run commands with a pseudo-terminal for programs that require a TTY. PTY mode is designed for interactive and long-running programs — shells, REPLs, curses apps.

# Programs see a real terminal
{:ok, pid} = Proc.start("python3", ["-c", "import sys; print(sys.stdout.isatty())"], pty: true)
{:ok, data} = Proc.read(pid)
# data =~ "True"

# Interactive REPL
{:ok, pid} = Proc.start("python3", ["-i"], pty: true)
Proc.write(pid, "print(1 + 2)\n")
{:ok, data} = Proc.read(pid)

# Resize the terminal window
Proc.set_window_size(pid, 40, 120)

# Clean up when done (PTY doesn't support independent stdin close)
Proc.kill(pid, :sigkill)
Proc.await_exit(pid)

PTY caveats

PTY mode differs from pipe mode in important ways:

No independent stdin close — the PTY is a single bidirectional FD. Use kill/2 to terminate.
Echo — the terminal echoes input back by default, so reads include what you wrote.
Fast-exiting commands — if a command exits before you call read/1 (e.g., in iex), the PTY buffer may be lost. PTY mode is meant for long-running programs. For simple commands, use pipe mode (the default).
Line buffering — the terminal line discipline buffers input until \n by default.

# DON'T: Use PTY for simple commands in iex (data may be lost)
{:ok, pid} = Proc.start("echo", ["hi"], pty: true)
# ... time passes while you type ...
Proc.read(pid)  # => :eof (too late, PTY torn down)

# DO: Use pipe mode (default) for simple commands
{:ok, pid} = Proc.start("echo", ["hi"])
{:ok, "hi\n"} = Proc.read(pid)

# DO: Use PTY for interactive programs that need a terminal
{:ok, pid} = Proc.start("bash", [], pty: true)
Proc.write(pid, "echo hello\n")
{:ok, data} = Proc.read(pid)  # works — bash stays alive

Daemon Mode

Run long-lived processes under a supervision tree. Automatically drains stdout/stderr to prevent pipe blocking:

# In your supervisor
children = [
  {NetRunner.Daemon,
   cmd: "redis-server",
   args: ["--port", "6380"],
   on_output: :log,
   name: MyApp.Redis}
]

Supervisor.start_link(children, strategy: :one_for_one)

# Interact with the daemon
NetRunner.Daemon.os_pid(MyApp.Redis)   # => 12345
NetRunner.Daemon.alive?(MyApp.Redis)   # => true
NetRunner.Daemon.write(MyApp.Redis, "PING\r\n")

Output handling options:

:discard (default) — silently consume output to prevent pipe blocking
:log — log each chunk via Logger.info
fun/1 — custom callback for each chunk

# Custom output handler
{:ok, daemon} = NetRunner.Daemon.start_link(
  cmd: "tail",
  args: ["-f", "/var/log/system.log"],
  on_output: fn chunk -> MyApp.LogIngester.ingest(chunk) end
)

Graceful shutdown: on terminate/2, sends SIGTERM, waits 5 seconds, then SIGKILL.

cgroup Support (Linux)

Isolate child processes in a cgroup v2 hierarchy for resource control:

{:ok, pid} = Proc.start("my_worker", [],
  cgroup_path: "net_runner/job_123"
)

The shepherd creates the cgroup directory, moves the child into it, and cleans up on exit (kills all processes via cgroup.kill, then removes the directory). No-op on macOS.

Parallel Execution

Every NetRunner process is fully independent — no shared state, no singleton bottleneck:

# Process files in parallel
files
|> Task.async_stream(fn file ->
  {out, 0} = NetRunner.run(["ffprobe", "-hide_banner", file])
  {file, out}
end, max_concurrency: System.schedulers_online())
|> Enum.to_list()

# Fan-out pattern
urls
|> Task.async_stream(fn url ->
  {body, 0} = NetRunner.run(["curl", "-s", url], timeout: 30_000)
  body
end, max_concurrency: 20)
|> Enum.to_list()

Why NetRunner?

System.cmd uses Erlang ports which tie stdin/stdout lifecycle together, have no backpressure (mailbox flooding), and leave zombies when programs ignore stdin EOF. This was filed as ERL-128 and marked Won't Fix.

Feature	System.cmd	MuonTrap	Exile	NetRunner
No zombies (BEAM SIGKILL)	-	Yes	-	Yes
NIF async I/O + backpressure	-	-	Yes	Yes
Close stdin independently	-	-	Yes	Yes
Process group kills	-	Yes	-	Yes
PTY / terminal emulation	-	-	-	Yes
cgroup isolation (Linux)	-	Yes	-	Yes
Per-process stats	-	-	-	Yes
Daemon mode (supervision)	-	Yes	-	Yes

Options Reference

`NetRunner.run/2`

Option	Type	Default	Description
`:input`	binary \| list	`nil`	Data to write to stdin
`:timeout`	integer	`nil`	Wall-clock timeout in ms
`:max_output_size`	integer	`nil`	Max bytes to collect
`:stderr`	atom	`:consume`	`:consume`, `:redirect`, or `:disabled`
`:pty`	boolean	`false`	Use pseudo-terminal
`:kill_timeout`	integer	`5000`	SIGTERM→SIGKILL escalation timeout in ms
`:cgroup_path`	string	`nil`	cgroup v2 path (Linux only)

`NetRunner.Process.start/3`

Accepts all options above except :input, :timeout, and :max_output_size.

Architecture

BEAM Process (NetRunner.Process GenServer)
    |
    |-- Port.open("priv/shepherd", [:nouse_stdio, :exit_status])
    |     |
    |     v
    |   Shepherd Binary (stays alive for child's lifetime)
    |     |-- fork() → child process (execvp)
    |     |-- Passes pipe FDs to BEAM via UDS + SCM_RIGHTS
    |     |-- poll() loop: watches UDS + signal pipe (SIGCHLD)
    |     |-- BEAM dies (POLLHUP) → SIGTERM → SIGKILL child
    |     |-- Child dies (SIGCHLD) → notify BEAM, exit
    |
    |-- NIF (enif_select on raw FDs, dirty IO schedulers)
    |     |-- Demand-driven backpressure via OS pipe buffers
    |
    v
NetRunner.Watcher (belt-and-suspenders with shepherd)
    |-- Monitors GenServer, kills OS process if GenServer crashes

Three layers of zombie prevention:

Shepherd — detects BEAM death via POLLHUP on UDS, kills child process group
Watcher — detects GenServer crash via Process.monitor, kills child via NIF
NIF resource destructor — closes FDs on GC, child sees broken pipe

Performance

Spawn overhead is ~20-25ms per process (fork + execvp + UDS handshake + FD passing). This is a one-time cost — actual I/O is sub-millisecond. For comparison, System.cmd is ~10-15ms (simpler setup, same fork cost).

The tradeoff: ~10ms extra spawn time buys you backpressure, zero zombies, and process group kills. For long-running processes or large data streams, the spawn cost is negligible.

Documentation

API Reference (HexDocs)
Architecture — component diagrams, spawn sequence, zombie prevention
Protocol — shepherd ↔ BEAM byte protocol spec
Backpressure — how demand-driven I/O works
Decisions — architecture decision records
Comparison — feature matrix vs alternatives

Requirements

Elixir ~> 1.17
Erlang/OTP 27+
C compiler (gcc or clang)
make
Linux or macOS

License

Apache-2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
c_src		c_src
docs		docs
lib		lib
priv		priv
test		test
.formatter.exs		.formatter.exs
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NetRunner

Installation

Quick Start

Streaming

Timeouts and Limits

Process API

Incremental reads and writes

Signals and process groups

Checking process state

Stats

PTY Mode

PTY caveats

Daemon Mode

cgroup Support (Linux)

Parallel Execution

Why NetRunner?

Options Reference

`NetRunner.run/2`

`NetRunner.Process.start/3`

Architecture

Performance

Documentation

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NetRunner

Installation

Quick Start

Streaming

Timeouts and Limits

Process API

Incremental reads and writes

Signals and process groups

Checking process state

Stats

PTY Mode

PTY caveats

Daemon Mode

cgroup Support (Linux)

Parallel Execution

Why NetRunner?

Options Reference

NetRunner.run/2

NetRunner.Process.start/3

Architecture

Performance

Documentation

Requirements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`NetRunner.run/2`

`NetRunner.Process.start/3`

Packages