mbus

Rust browser + LLM agent for deterministic, single-step web automation.

Overview

mbus runs a tight loop of snapshot -> propose -> validate -> apply. Actions are strictly validated against the current observation before execution, and every step is logged as JSON for traceability.

Key traits:

Chromium CDP browser adapter (chromiumoxide)
Strict action schema + validation
Model router with fast -> mid -> strong escalation
Structured JSON logs plus tracing + metrics

Current Status

Primary success bar: the 12-task local obstacle suite run through mbus challenge.
Secondary regression bar: the 10-task local mbus bench harness.
Checked-in tests validate challenge/bench/package plumbing with a mock OpenAI-compatible server; real-model evidence is generated locally and packaged, not committed.
Current status, open gaps, and proof expectations live in docs/status.md. Live-site policy lives in docs/live-eval-policy.md.

Install

Prerequisites:

Rust toolchain (stable)
A Chromium/Chrome binary discoverable by chromiumoxide

Build:

cargo build

Quickstart

Default (stub LLM, immediately returns done after snapshot):

cargo run -- run --task "open example.com"

OpenAI mode:

MBUS_LLM_MODE=openai MBUS_LLM_API_KEY=... \
  cargo run -- run --task "Find the shipping address" \
  --llm-model-fast gpt-5-mini \
  --llm-model-mid gpt-5.1 \
  --llm-model-strong gpt-5.2

Scripted mode (feed actions from a file):

cargo run -- run --task "Click the button" \
  --llm-mode scripted \
  --llm-actions-file ./actions.jsonl

For a concise install + quickstart path (prerequisites, install steps, and the first successful run with validated commands), see docs/quickstart.md.

CLI

mbus run flags (most common):

--task or --task-file
--plan or --plan-file
--config
--headless
--initial-url
--browser-executable, --browser-launch-timeout-ms
--browser-no-sandbox, repeated --browser-arg, --browser-keep-user-data-dir
--max-steps
--llm-mode (stub, scripted, openai)
--llm-base-url, --llm-api-key
--llm-model-fast, --llm-model-mid, --llm-model-strong
--llm-timeout-ms, --llm-temperature, --llm-max-tokens
--llm-actions-file
--extract-output

mbus bench flags:

--tasks-dir (default: harness/tasks)
--report-path (default: target/bench/report.json)
--config
--headless
--browser-executable, --browser-launch-timeout-ms
--browser-no-sandbox, repeated --browser-arg, --browser-keep-user-data-dir
--max-steps-per-task (default: 40)
--required-passes (default: total tasks minus two)
--llm-mode (scripted, openai)
--llm-base-url, --llm-api-key
--llm-model-fast, --llm-model-mid, --llm-model-strong
--llm-timeout-ms, --llm-temperature, --llm-max-tokens

mbus challenge flags:

--tasks-dir (default: harness/challenge)
--report-path (default: target/challenge/report.json)
--config
--headless
--browser-executable, --browser-launch-timeout-ms
--browser-no-sandbox, repeated --browser-arg, --browser-keep-user-data-dir
--max-steps-per-task (default: 40)
--required-passes (default: 10)
--llm-base-url, --llm-api-key
--llm-model-fast, --llm-model-mid, --llm-model-strong
--llm-timeout-ms, --llm-temperature, --llm-max-tokens
--llm-input-cost-per-million, --llm-output-cost-per-million

mbus package flags:

--report-path (required; existing challenge report)
--output-dir (default: target/challenge/package/<report-stem>)
--zip-path (default: target/challenge/package/<report-stem>.zip)
--overwrite

cdp_bootstrap:

validates browser startup only using the same browser config inputs as mbus
supports --config, --headless, --initial-url, --cdp-url
supports --browser-executable, --browser-launch-timeout-ms
supports --browser-no-sandbox, repeated --browser-arg, --browser-keep-user-data-dir

Challenge Proof Workflow

The canonical release-proof path is:

MBUS_LLM_API_KEY=... \
MBUS_LLM_INPUT_COST_PER_MILLION=... \
MBUS_LLM_OUTPUT_COST_PER_MILLION=... \
./scripts/run_challenge_proof.sh

The script:

Validates required environment variables.
Runs mbus challenge against the default 12-task local obstacle suite.
Packages the resulting report with mbus package.
Prints the exact report, bundle, and zip paths to inspect or share.

For a supplemental adversarial run, point challenge at the separate tasks dir:

MBUS_LLM_API_KEY=... cargo run --bin mbus -- challenge \
  --tasks-dir harness/challenge_adversarial \
  --required-passes 2

See docs/status.md for the current success bar and docs/live-eval-policy.md for what does and does not count as valid evidence.

Regression Harness

Run the local benchmark harness:

cargo run --bin mbus -- bench --llm-mode scripted

The command:

Starts a local HTTP harness server on 127.0.0.1 with deterministic pages.
Serves static harness pages from harness/pages.
Loads task fixtures from harness/tasks/*.json.
Executes each task with scripted actions in scripted mode.
Executes each task autonomously in openai mode (requires MBUS_LLM_API_KEY or --llm-api-key).
Writes the report to target/bench/report.json.
Enforces a gate (required_passes, default 8 of 10 tasks).

Use bench to catch regressions in the agent loop and report plumbing. It is not the primary release proof anymore.

Task fixture shape (example):

{
  "id": "bench-task-01",
  "task": "Navigate to benchmark task 01 and confirm marker text.",
  "start_path": "/bench/start",
  "max_steps": 40,
  "actions": [
    {"type": "navigate", "url": "{{base_url}}/bench/task-01"},
    {"type": "done", "summary": "Reached benchmark task 01"}
  ],
  "expect": {
    "status": "done",
    "final_url_contains": "/bench/task-01",
    "final_visible_text_contains": "BENCH TASK 01"
  }
}

Challenge Suite

Run the primary obstacle suite with the OpenAI-compatible path:

MBUS_LLM_API_KEY=... cargo run --bin mbus -- challenge

The command:

Loads autonomous challenge manifests from harness/challenge/*.json.
Starts the local harness server and serves obstacle pages from harness/pages/challenge.
Forces openai mode and persists screenshots to .ralph/runs/... for visual diff follow-up.
Writes an aggregate report to target/challenge/report.json.
Enforces the autonomous gate at 10 passed tasks out of 12 by default.
Uses observable-only success checks (final_url_contains, final_visible_text_contains, screenshots) rather than hidden app knowledge.

Checked-in integration tests exercise this path with a mock OpenAI-compatible server so the CLI/report/package flow stays stable. Real-model challenge quality still needs local proof runs and packaged artifacts.

Package an existing challenge run:

cargo run --bin mbus -- package --report-path target/challenge/report.json

The package command:

Validates that the report parses and matches the challenge report shape.
Verifies every referenced artifact exists and matches the recorded SHA-256 when present.
Copies report.json, README.md, and referenced artifacts into a portable bundle directory.
Writes manifest.json with relative file inventory plus gate, usage, and cost summaries.
Emits a zip archive next to the unpacked bundle for submission or sharing.

Challenge manifest shape (example):

{
  "id": "challenge-01-cookie-banner",
  "task": "Dismiss the cookie banner so the page clearly shows COOKIE BANNER DISMISSED.",
  "start_url": "{{base_url}}/challenge/cookie-banner.html",
  "allowed_domains": ["127.0.0.1"],
  "max_steps": 20,
  "expect": {
    "final_url_contains": "/challenge/cookie-banner.html",
    "final_visible_text_contains": "COOKIE BANNER DISMISSED",
    "screenshot_artifact_required": true
  }
}

Config

Config precedence is: defaults -> config file -> env (MBUS_*) -> CLI flags. Config file lookup order is: --config, MBUS_CONFIG, ./mbus.toml, ~/.mbus.toml.

Sample mbus.toml:

[agent]
max_steps = 40

[agent.memory]
max_observations = 8
max_history = 100

[browser]
headless = true
# headful = true
initial_url = "about:blank"
snapshot_timeout_ms = 5000
action_timeout_ms = 10000
max_elements = 50
max_text_len = 4000

[router]
failures_to_mid = 2
failures_to_strong = 4
no_progress_to_mid = 2
no_progress_to_strong = 4
ladder = ["gpt-5-mini:medium", "gpt-5.1:medium", "gpt-5.2:medium"]

[validator]
allow_insecure = false
max_text_len = 2000
max_wait_ms = 30000
max_scroll = 2000

[llm]
mode = "stub"
base_url = "https://api.openai.com/v1"
api_key = ""
model_fast = "gpt-5-mini"
model_mid = "gpt-5.1"
model_strong = "gpt-5.2"
timeout_ms = 30000
temperature = 1.0
max_tokens = 256
actions_file = "actions.jsonl"

[output]
extract_output = "mbus_extract.json"

To run with a visible browser window, set headful = true in the config or pass --headless false on the CLI.

Environment variable overrides (full list):

MBUS_CONFIG
MBUS_MAX_STEPS
MBUS_MAX_NO_PROGRESS_STEPS
MBUS_MEMORY_MAX_OBSERVATIONS
MBUS_MEMORY_MAX_HISTORY
MBUS_HEADLESS
MBUS_INITIAL_URL
MBUS_CDP_URL
MBUS_SNAPSHOT_TIMEOUT_MS
MBUS_ACTION_TIMEOUT_MS
MBUS_MAX_ELEMENTS
MBUS_MAX_TEXT_LEN
MBUS_ROUTER_FAILURES_TO_MID
MBUS_ROUTER_FAILURES_TO_STRONG
MBUS_ROUTER_NO_PROGRESS_TO_MID
MBUS_ROUTER_NO_PROGRESS_TO_STRONG
MBUS_ROUTER_REASONING_EFFORT
MBUS_ROUTER_LADDER
MBUS_ALLOW_INSECURE
MBUS_VALIDATOR_MAX_TEXT_LEN
MBUS_VALIDATOR_MAX_WAIT_MS
MBUS_VALIDATOR_MAX_SCROLL
MBUS_LLM_MODE
MBUS_LLM_BASE_URL
MBUS_LLM_API_KEY
MBUS_LLM_MODEL_FAST
MBUS_LLM_MODEL_MID
MBUS_LLM_MODEL_STRONG
MBUS_LLM_TIMEOUT_MS
MBUS_LLM_TEMPERATURE
MBUS_LLM_MAX_TOKENS
MBUS_LLM_INPUT_COST_PER_MILLION
MBUS_LLM_OUTPUT_COST_PER_MILLION
MBUS_LLM_ACTIONS_FILE
MBUS_EXTRACT_OUTPUT
MBUS_SCREENSHOT_ENABLED
MBUS_SCREENSHOT_PERSIST

Scripted Actions Format

Scripted actions accept any of the following formats:

A JSON array of actions
A single JSON action object
JSON Lines (one action per line)

Example (actions.jsonl):

{"type":"navigate","url":"https://example.com"}
{"type":"click","id":"el_1"}
{"type":"done","summary":"clicked"}

Logs and Telemetry

mbus run prints JSON log lines to stdout (type = config | step | summary).
Tracing logs are emitted as JSON to stderr; set RUST_LOG=info or similar to control verbosity.
Metrics are in-process counters and timers; see src/telemetry.rs for names.

Troubleshooting

Chromium fails to launch: install Chromium/Chrome and ensure it is discoverable by chromiumoxide.
OpenAI 401/403: ensure MBUS_LLM_API_KEY is set for openai mode.
Invalid scripted actions: confirm the JSON matches the action schema and references real element ids.
Timeouts on slow pages: increase snapshot_timeout_ms or action_timeout_ms.
Navigation to non-http(s) URLs blocked: set allow_insecure = true only when needed and understand the security implications.

For a structured operations runbook, recovery steps, and the log/metric fields you should monitor, see docs/operations-runbook.md. For the product-level current state and remaining proof work, see docs/status.md.

Runbook

Verification:

cargo test
Run a short task with mbus run and confirm a summary JSON log line is emitted and, if using extract actions, mbus_extract.json is written.

Rollback:

Checkout the previous release tag or commit and rebuild.
Revert any config changes (especially router thresholds and timeouts) to the last known-good values.

For the full verification checklist, rollback recipe, and structured logging guidance, see docs/operations-runbook.md.

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
.github/workflows		.github/workflows
docs		docs
harness		harness
notes		notes
prompts		prompts
scripts		scripts
spikes		spikes
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
challenge.toml		challenge.toml
notes.md		notes.md
prd.json		prd.json
prd.md		prd.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mbus

Overview

Current Status

Install

Quickstart

CLI

Challenge Proof Workflow

Regression Harness

Challenge Suite

Config

Scripted Actions Format

Logs and Telemetry

Troubleshooting

Runbook

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mbus

Overview

Current Status

Install

Quickstart

CLI

Challenge Proof Workflow

Regression Harness

Challenge Suite

Config

Scripted Actions Format

Logs and Telemetry

Troubleshooting

Runbook

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages