Skip to content

nyo16/net_runner

Repository files navigation

NetRunner

Hex.pm Docs

Safe OS process execution for Elixir. Zero zombie processes, NIF-based backpressure, PTY support, and cgroup isolation.

NetRunner combines NIF-based async I/O (enif_select) with a persistent shepherd binary that watches over child processes. Three layers of cleanup guarantee zero zombies under every failure mode — BEAM crash, GenServer crash, or resource leak.

Installation

def deps do
  [
    {:net_runner, "~> 1.0"}
  ]
end

Requires a C compiler (gcc or clang) and make.

Quick Start

# Run a command, collect output
{output, 0} = NetRunner.run(~w(echo hello))
# => {"hello\n", 0}

# Pipe data through stdin
{output, 0} = NetRunner.run(~w(cat), input: "from elixir")
# => {"from elixir", 0}

# Nonzero exit status
{"", 1} = NetRunner.run(~w(false))

# Multi-word commands with arguments
{output, 0} = NetRunner.run(["sh", "-c", "echo $HOME"])

Streaming

Lazy, demand-driven streams with backpressure — data stays in the OS pipe buffer until you consume it, so you can stream gigabytes without OOM:

# Stream stdout chunks with backpressure
NetRunner.stream!(~w(cat /usr/share/dict/words))
|> Stream.filter(&String.starts_with?(&1, "elixir"))
|> Enum.to_list()

# Pipe input through a command
NetRunner.stream!(~w(tr a-z A-Z), input: "hello world")
|> Enum.join()
# => "HELLO WORLD"

# Sort lines
NetRunner.stream!(~w(sort), input: "cherry\napple\nbanana\n")
|> Enum.join()
# => "apple\nbanana\ncherry\n"

# Process large files without loading into memory
File.stream!("huge.csv")
|> NetRunner.stream!(~w(grep ERROR))
|> Stream.each(&IO.write/1)
|> Stream.run()

# Non-raising variant
{:ok, stream} = NetRunner.stream(~w(sort), input: "c\nb\na\n")
Enum.to_list(stream)
# => ["a\nb\nc\n"]

Timeouts and Limits

# Kill process after 500ms
{:error, :timeout} = NetRunner.run(~w(sleep 100), timeout: 500)

# Cap output size — kills process if exceeded
{:error, {:max_output_exceeded, _partial}} =
  NetRunner.run(["sh", "-c", "yes"], max_output_size: 1000)

# Custom kill escalation: SIGTERM → wait 2s → SIGKILL
NetRunner.run(~w(my_server), kill_timeout: 2000, timeout: 10_000)

Process API

For fine-grained control over the OS process lifecycle:

alias NetRunner.Process, as: Proc

# Start a process
{:ok, pid} = Proc.start("cat", [])

# Write to stdin
:ok = Proc.write(pid, "hello world")
:ok = Proc.close_stdin(pid)

# Read from stdout (blocks until data available)
{:ok, "hello world"} = Proc.read(pid)
:eof = Proc.read(pid)

# Wait for exit
{:ok, 0} = Proc.await_exit(pid)

Incremental reads and writes

{:ok, pid} = Proc.start("cat", [])

# Write in chunks — useful for feeding large data
:ok = Proc.write(pid, "chunk 1 ")
:ok = Proc.write(pid, "chunk 2 ")
:ok = Proc.write(pid, "chunk 3")
:ok = Proc.close_stdin(pid)

# Read comes back in whatever chunks the OS delivers
{:ok, data} = Proc.read(pid)
# data => "chunk 1 chunk 2 chunk 3" (may come in multiple reads)

Signals and process groups

{:ok, pid} = Proc.start("sleep", ["100"])
:ok = Proc.kill(pid, :sigterm)
{:ok, 143} = Proc.await_exit(pid)  # 128 + SIGTERM(15)

# Signals kill the entire process group (catches grandchildren)
{:ok, pid} = Proc.start("sh", ["-c", "sleep 100 & sleep 100 & wait"])
:ok = Proc.kill(pid, :sigkill)
# All three processes (sh + both sleeps) are killed

Supported signals: :sigterm, :sigkill, :sigint, :sighup, :sigusr1, :sigusr2, :sigstop, :sigcont, :sigquit, :sigpipe.

Checking process state

{:ok, pid} = Proc.start("sleep", ["10"])

Proc.alive?(pid)   # => true
Proc.os_pid(pid)   # => 12345 (the actual OS PID)

Proc.kill(pid, :sigkill)
Proc.await_exit(pid)

Proc.alive?(pid)   # => false

Stats

Every process tracks I/O statistics automatically:

{:ok, pid} = Proc.start("cat", [])
Proc.write(pid, "hello")
Proc.close_stdin(pid)
Proc.read(pid)
Proc.await_exit(pid)

stats = Proc.stats(pid)
stats.bytes_in     # => 5       (bytes written to stdin)
stats.bytes_out    # => 5       (bytes read from stdout)
stats.read_count   # => 1       (number of read calls)
stats.write_count  # => 1       (number of write calls)
stats.duration_ms  # => 3       (wall-clock time)
stats.exit_status  # => 0       (exit code)

PTY Mode

Run commands with a pseudo-terminal for programs that require a TTY. PTY mode is designed for interactive and long-running programs — shells, REPLs, curses apps.

# Programs see a real terminal
{:ok, pid} = Proc.start("python3", ["-c", "import sys; print(sys.stdout.isatty())"], pty: true)
{:ok, data} = Proc.read(pid)
# data =~ "True"

# Interactive REPL
{:ok, pid} = Proc.start("python3", ["-i"], pty: true)
Proc.write(pid, "print(1 + 2)\n")
{:ok, data} = Proc.read(pid)

# Resize the terminal window
Proc.set_window_size(pid, 40, 120)

# Clean up when done (PTY doesn't support independent stdin close)
Proc.kill(pid, :sigkill)
Proc.await_exit(pid)

PTY caveats

PTY mode differs from pipe mode in important ways:

  • No independent stdin close — the PTY is a single bidirectional FD. Use kill/2 to terminate.
  • Echo — the terminal echoes input back by default, so reads include what you wrote.
  • Fast-exiting commands — if a command exits before you call read/1 (e.g., in iex), the PTY buffer may be lost. PTY mode is meant for long-running programs. For simple commands, use pipe mode (the default).
  • Line buffering — the terminal line discipline buffers input until \n by default.
# DON'T: Use PTY for simple commands in iex (data may be lost)
{:ok, pid} = Proc.start("echo", ["hi"], pty: true)
# ... time passes while you type ...
Proc.read(pid)  # => :eof (too late, PTY torn down)

# DO: Use pipe mode (default) for simple commands
{:ok, pid} = Proc.start("echo", ["hi"])
{:ok, "hi\n"} = Proc.read(pid)

# DO: Use PTY for interactive programs that need a terminal
{:ok, pid} = Proc.start("bash", [], pty: true)
Proc.write(pid, "echo hello\n")
{:ok, data} = Proc.read(pid)  # works — bash stays alive

Daemon Mode

Run long-lived processes under a supervision tree. Automatically drains stdout/stderr to prevent pipe blocking:

# In your supervisor
children = [
  {NetRunner.Daemon,
   cmd: "redis-server",
   args: ["--port", "6380"],
   on_output: :log,
   name: MyApp.Redis}
]

Supervisor.start_link(children, strategy: :one_for_one)

# Interact with the daemon
NetRunner.Daemon.os_pid(MyApp.Redis)   # => 12345
NetRunner.Daemon.alive?(MyApp.Redis)   # => true
NetRunner.Daemon.write(MyApp.Redis, "PING\r\n")

Output handling options:

  • :discard (default) — silently consume output to prevent pipe blocking
  • :log — log each chunk via Logger.info
  • fun/1 — custom callback for each chunk
# Custom output handler
{:ok, daemon} = NetRunner.Daemon.start_link(
  cmd: "tail",
  args: ["-f", "/var/log/system.log"],
  on_output: fn chunk -> MyApp.LogIngester.ingest(chunk) end
)

Graceful shutdown: on terminate/2, sends SIGTERM, waits 5 seconds, then SIGKILL.

cgroup Support (Linux)

Isolate child processes in a cgroup v2 hierarchy for resource control:

{:ok, pid} = Proc.start("my_worker", [],
  cgroup_path: "net_runner/job_123"
)

The shepherd creates the cgroup directory, moves the child into it, and cleans up on exit (kills all processes via cgroup.kill, then removes the directory). No-op on macOS.

Parallel Execution

Every NetRunner process is fully independent — no shared state, no singleton bottleneck:

# Process files in parallel
files
|> Task.async_stream(fn file ->
  {out, 0} = NetRunner.run(["ffprobe", "-hide_banner", file])
  {file, out}
end, max_concurrency: System.schedulers_online())
|> Enum.to_list()

# Fan-out pattern
urls
|> Task.async_stream(fn url ->
  {body, 0} = NetRunner.run(["curl", "-s", url], timeout: 30_000)
  body
end, max_concurrency: 20)
|> Enum.to_list()

Why NetRunner?

System.cmd uses Erlang ports which tie stdin/stdout lifecycle together, have no backpressure (mailbox flooding), and leave zombies when programs ignore stdin EOF. This was filed as ERL-128 and marked Won't Fix.

Feature System.cmd MuonTrap Exile NetRunner
No zombies (BEAM SIGKILL) - Yes - Yes
NIF async I/O + backpressure - - Yes Yes
Close stdin independently - - Yes Yes
Process group kills - Yes - Yes
PTY / terminal emulation - - - Yes
cgroup isolation (Linux) - Yes - Yes
Per-process stats - - - Yes
Daemon mode (supervision) - Yes - Yes

Options Reference

NetRunner.run/2

Option Type Default Description
:input binary | list nil Data to write to stdin
:timeout integer nil Wall-clock timeout in ms
:max_output_size integer nil Max bytes to collect
:stderr atom :consume :consume, :redirect, or :disabled
:pty boolean false Use pseudo-terminal
:kill_timeout integer 5000 SIGTERM→SIGKILL escalation timeout in ms
:cgroup_path string nil cgroup v2 path (Linux only)

NetRunner.Process.start/3

Accepts all options above except :input, :timeout, and :max_output_size.

Architecture

BEAM Process (NetRunner.Process GenServer)
    |
    |-- Port.open("priv/shepherd", [:nouse_stdio, :exit_status])
    |     |
    |     v
    |   Shepherd Binary (stays alive for child's lifetime)
    |     |-- fork() → child process (execvp)
    |     |-- Passes pipe FDs to BEAM via UDS + SCM_RIGHTS
    |     |-- poll() loop: watches UDS + signal pipe (SIGCHLD)
    |     |-- BEAM dies (POLLHUP) → SIGTERM → SIGKILL child
    |     |-- Child dies (SIGCHLD) → notify BEAM, exit
    |
    |-- NIF (enif_select on raw FDs, dirty IO schedulers)
    |     |-- Demand-driven backpressure via OS pipe buffers
    |
    v
NetRunner.Watcher (belt-and-suspenders with shepherd)
    |-- Monitors GenServer, kills OS process if GenServer crashes

Three layers of zombie prevention:

  1. Shepherd — detects BEAM death via POLLHUP on UDS, kills child process group
  2. Watcher — detects GenServer crash via Process.monitor, kills child via NIF
  3. NIF resource destructor — closes FDs on GC, child sees broken pipe

Performance

Spawn overhead is ~20-25ms per process (fork + execvp + UDS handshake + FD passing). This is a one-time cost — actual I/O is sub-millisecond. For comparison, System.cmd is ~10-15ms (simpler setup, same fork cost).

The tradeoff: ~10ms extra spawn time buys you backpressure, zero zombies, and process group kills. For long-running processes or large data streams, the spawn cost is negligible.

Documentation

Requirements

  • Elixir ~> 1.17
  • Erlang/OTP 27+
  • C compiler (gcc or clang)
  • make
  • Linux or macOS

License

Apache-2.0 — see LICENSE.

About

Safe OS process execution for Elixir. Zero zombie processes, NIF-based backpressure, PTY support, and cgroup isolation.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors