A lightweight DAG (Directed Acyclic Graph) workflow runner for defining and executing multi-step workflows with dependency resolution. Define workflows in YAML, and dagrun handles parallel execution, dependency ordering, retries, and error propagation.
Inspired by dagu. Reimplemented from scratch with zero external dependencies using only Go's standard library.
dagu is a powerful DAG workflow engine with 3.2K+ stars. However, it has grown to include 60+ dependencies (Docker SDK, Tailscale, LLM agents, Slack/Telegram bots, etc.), a GPL-3.0 license, and scope that extends far beyond its core value proposition.
dagrun strips away the complexity and delivers the essential DAG execution engine:
| Feature | dagu | dagrun |
|---|---|---|
| Dependencies | 60+ | 0 (stdlib only) |
| License | GPL-3.0 | GPL-3.0 |
| Binary size | Large (Docker, Tailscale, etc.) | Minimal |
| Executor types | 19+ | 3 (command, HTTP, script) |
| Web UI | Yes | No (CLI-focused) |
| LLM/AI agent | Yes | No |
| Docker executor | Yes | No |
| DAG engine | Yes | Yes |
| Parallel execution | Yes | Yes |
| Retry with backoff | Yes | Yes |
| Output passing | Yes | Yes |
| Preconditions | Yes | Yes |
| Lifecycle handlers | Yes | Yes |
go install github.com/JSLEEKR/dagrun@latestOr build from source:
git clone https://github.com/JSLEEKR/dagrun.git
cd dagrun
go build -o dagrun .Create hello.yaml:
name: hello-world
description: A simple workflow demo
steps:
- name: greet
command: echo "Hello from dagrun!"
- name: timestamp
command: date
- name: done
command: echo "All steps completed"
depends:
- greet
- timestampRun it:
dagrun run hello.yamlOutput:
=== DAG "hello-world": succeeded (total=3 succeeded=3 failed=0 skipped=0 aborted=0 duration=15ms) ===
[OK] greet (succeeded, 5ms)
[OK] timestamp (succeeded, 6ms)
[OK] done (succeeded, 4ms)
Define workflows declaratively with steps, dependencies, environment variables, and more:
name: data-pipeline
description: ETL pipeline with parallel extraction
max_active_steps: 4
timeout_sec: 300
env:
DATA_DIR: /tmp/data
LOG_LEVEL: info
params:
- DATE=2024-01-01
steps:
- name: extract-users
command: ./extract.sh users $DATE
output: USER_COUNT
- name: extract-orders
command: ./extract.sh orders $DATE
output: ORDER_COUNT
- name: transform
command: ./transform.sh $USER_COUNT $ORDER_COUNT
depends:
- extract-users
- extract-orders
- name: load
command: ./load.sh
depends:
- transform
retry_policy:
limit: 3
interval_sec: 5
backoff: 2.0dagrun uses Kahn's algorithm for topological sorting with cycle detection:
- Steps with no dependencies run in parallel
- Steps wait for all upstream dependencies to complete
- Cycles are detected at build time with clear error messages
- Failed steps cascade: downstream steps are automatically skipped
steps:
- name: build
command: make build
- name: test-unit
command: make test-unit
depends: [build]
- name: test-integration
command: make test-integration
depends: [build]
- name: deploy
command: make deploy
depends: [test-unit, test-integration]Execution order: build -> test-unit + test-integration (parallel) -> deploy
Runs shell commands:
- name: list-files
command: ls -la /tmp
shell: bash # optional, defaults to sh
working_dir: /home/userMakes HTTP requests:
- name: health-check
type: http
http:
method: GET
url: http://localhost:8080/health
headers:
Authorization: "Bearer ${API_TOKEN}"
timeout: 10Runs multi-line scripts:
- name: complex-task
type: script
script: |
#!/bin/bash
set -e
echo "Starting complex operation..."
for i in $(seq 1 5); do
echo "Step $i"
sleep 1
done
echo "Done!"Capture step output and pass it to downstream steps:
steps:
- name: get-version
command: cat VERSION
output: APP_VERSION
- name: build
command: docker build -t myapp:${APP_VERSION} .
depends: [get-version]
- name: tag
command: echo "Tagged version ${APP_VERSION}"
depends: [build]Configure retry policies for unreliable steps:
- name: deploy
command: ./deploy.sh
retry_policy:
limit: 5 # max retries
interval_sec: 2 # initial wait between retries
backoff: 2.0 # multiplier (2s, 4s, 8s, 16s, 32s)Guard step execution with conditions:
- name: deploy-prod
command: ./deploy.sh production
preconditions:
- condition: echo $BRANCH
expected: main
- condition: test -f build/app.tar.gzIf any precondition fails, the step is skipped (not failed).
Allow downstream steps to run even if this step fails:
- name: optional-step
command: ./optional-check.sh
continue_on:
failure: true
skipped: true
- name: next-step
command: echo "runs even if optional-step fails"
depends: [optional-step]Limit parallel execution:
name: resource-heavy
max_active_steps: 2 # at most 2 steps run simultaneously
steps:
- name: job-a
command: heavy-process-a
- name: job-b
command: heavy-process-b
- name: job-c
command: heavy-process-c
- name: job-d
command: heavy-process-dRun actions on workflow success, failure, or exit:
name: monitored-workflow
handler_on:
success:
name: notify-success
command: curl -X POST https://hooks.slack.com/... -d '{"text":"Workflow succeeded"}'
failure:
name: notify-failure
command: curl -X POST https://hooks.slack.com/... -d '{"text":"Workflow FAILED"}'
exit:
name: cleanup
command: rm -rf /tmp/work
steps:
- name: process
command: ./process.shSet timeouts at both DAG and step level:
name: time-limited
timeout_sec: 600 # overall DAG timeout: 10 minutes
steps:
- name: quick-check
command: ./check.sh
timeout_sec: 30 # step-level timeout
- name: long-process
command: ./process.sh
timeout_sec: 300
depends: [quick-check]Execute a workflow:
dagrun run [options] <workflow.yaml>
Options:
-v Verbose output (shows execution details)
-json Output results as JSON
-timeout N Override DAG timeout (seconds)
-params Comma-separated key=value parametersExamples:
# Basic run
dagrun run pipeline.yaml
# Verbose with parameters
dagrun run -v -params "ENV=staging,VERSION=2.1" deploy.yaml
# JSON output for scripting
dagrun run -json pipeline.yaml | jq '.nodes[] | select(.status == "failed")'
# Override timeout
dagrun run -timeout 120 long-pipeline.yamlValidate a workflow file without executing:
dagrun validate <workflow.yaml>Checks for:
- Valid YAML syntax
- Step name uniqueness
- Missing dependency references
- Circular dependencies
Show the execution plan (dry-run):
dagrun status <workflow.yaml>Output:
Workflow: data-pipeline
Description: ETL pipeline
Execution Plan:
1. extract-users [command]
2. extract-orders [command]
3. transform [command] (depends: extract-users, extract-orders)
4. load [command] (depends: transform)
Generate DOT graph for visualization:
dagrun dot pipeline.yaml > pipeline.dot
dot -Tpng pipeline.dot -o pipeline.png # requires graphvizShow version information:
dagrun versionYAML File
-> Parser (internal/parser) — Minimal YAML parser, zero deps
-> Model (internal/model) — Domain types: DAG, Step, NodeResult
-> DAG Builder (internal/dag) — Kahn's algorithm, cycle detection
-> Runner (internal/runner) — Channel-driven parallel executor
-> Executor (internal/executor) — Command, HTTP, Script executors
-> CLI (internal/cli) — User interface
The runner uses Go channels for the event loop, inspired by dagu's architecture:
readyCh — nodes whose dependencies are all satisfied
doneCh — completed nodes signaling downstream dependents
The event loop:
- Seed root nodes (zero dependencies) into
readyCh - For each ready node: spawn a goroutine to execute
- On completion: check downstream dependents, send ready ones to
readyCh - Detect deadlocks: if no active nodes and DAG is incomplete
Uses Kahn's algorithm (BFS-based topological sort):
- Build in-degree map for all nodes
- Start with zero in-degree nodes
- Process each node, decrement dependents' in-degree
- If all nodes processed: valid DAG with topological order
- If not all processed: cycle exists (remaining nodes form the cycle)
When a step fails:
- All downstream dependents are skipped (cascade)
- Unless the failed step has
continue_on.failure: true - The overall DAG status reflects the worst node status
| Field | Type | Description |
|---|---|---|
name |
string | Workflow name |
description |
string | Workflow description |
steps |
array | List of step definitions (required) |
env |
map | Environment variables for all steps |
params |
array | Parameters (key=value format) |
shell |
string | Default shell for command steps |
working_dir |
string | Default working directory |
max_active_steps |
int | Max concurrent step execution |
timeout_sec |
int | Overall DAG timeout |
log_dir |
string | Log output directory |
handler_on |
object | Lifecycle handlers (success/failure/exit) |
| Field | Type | Description |
|---|---|---|
name |
string | Step name (unique within DAG) |
description |
string | Step description |
command |
string | Shell command to execute |
script |
string | Multi-line script content |
type |
string | Executor type: command, http, script |
shell |
string | Shell to use (default: sh) |
working_dir |
string | Working directory |
depends |
array | List of dependency step names |
output |
string | Variable name to capture stdout |
env |
map | Step-specific environment variables |
timeout_sec |
int | Step timeout in seconds |
continue_on |
object | Continue policy (failure/skipped) |
retry_policy |
object | Retry configuration |
preconditions |
array | Conditions to check before execution |
http |
object | HTTP executor configuration |
| Field | Type | Description |
|---|---|---|
method |
string | HTTP method (default: GET) |
url |
string | Request URL |
headers |
map | Request headers |
body |
string | Request body |
timeout |
int | Request timeout in seconds |
| Field | Type | Description |
|---|---|---|
limit |
int | Maximum retry attempts |
interval_sec |
int | Initial wait between retries |
backoff |
float | Backoff multiplier |
When using -json flag, results are output as structured JSON:
{
"name": "my-workflow",
"status": "succeeded",
"duration_ms": 1523,
"nodes": [
{
"name": "step-1",
"status": "succeeded",
"output": "hello world",
"duration_ms": 45,
"exit_code": 0
},
{
"name": "step-2",
"status": "succeeded",
"duration_ms": 1478,
"exit_code": 0,
"retries": 1
}
]
}- YAML workflow definition format
- Kahn's algorithm for topological sort
- Channel-driven event loop architecture
- Node lifecycle (preconditions -> execute -> handlers)
- Retry with backoff
- Output capture and variable passing
- Precondition evaluation
- Web UI / dashboard
- Docker executor
- SSH/SFTP executor
- LLM/AI agent integration
- Distributed worker mode
- Tailscale tunneling
- Slack/Telegram notifications
- Git sync
- 19+ executor types (keeping only 3)
- 60+ external dependencies (keeping 0)
- Zero dependencies: Go stdlib only
- Focused scope: DAG execution, nothing else
- Better error tracing: Per-step timing, exit codes, retry counts
- Smaller attack surface: No external deps = no supply chain risk
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing) - Run tests (
go test ./...) - Commit your changes
- Push to the branch
- Open a Pull Request
# Build
go build -o dagrun .
# Test
go test ./... -v
# Vet
go vet ./...Note (Windows): Many tests invoke shell commands via
sh -cand will be skipped automatically on Windows (t.Skip). Run the full suite on Linux/macOS for complete coverage.
Note (go.sum): This project has zero external dependencies (stdlib only), so there is no
go.sumfile. This is expected.
GPL-3.0 License - see LICENSE for details.
- dagu - The original DAG workflow engine that inspired this project. dagrun reimplements dagu's core execution engine from scratch with a focus on simplicity and zero dependencies.