Skip to content

signalreason/mbus

Repository files navigation

mbus

Rust browser + LLM agent for deterministic, single-step web automation.

Overview

mbus runs a tight loop of snapshot -> propose -> validate -> apply. Actions are strictly validated against the current observation before execution, and every step is logged as JSON for traceability.

Key traits:

  • Chromium CDP browser adapter (chromiumoxide)
  • Strict action schema + validation
  • Model router with fast -> mid -> strong escalation
  • Structured JSON logs plus tracing + metrics

Current Status

  • Primary success bar: the 12-task local obstacle suite run through mbus challenge.
  • Secondary regression bar: the 10-task local mbus bench harness.
  • Checked-in tests validate challenge/bench/package plumbing with a mock OpenAI-compatible server; real-model evidence is generated locally and packaged, not committed.
  • Current status, open gaps, and proof expectations live in docs/status.md. Live-site policy lives in docs/live-eval-policy.md.

Install

Prerequisites:

  • Rust toolchain (stable)
  • A Chromium/Chrome binary discoverable by chromiumoxide

Build:

cargo build

Quickstart

Default (stub LLM, immediately returns done after snapshot):

cargo run -- run --task "open example.com"

OpenAI mode:

MBUS_LLM_MODE=openai MBUS_LLM_API_KEY=... \
  cargo run -- run --task "Find the shipping address" \
  --llm-model-fast gpt-5-mini \
  --llm-model-mid gpt-5.1 \
  --llm-model-strong gpt-5.2

Scripted mode (feed actions from a file):

cargo run -- run --task "Click the button" \
  --llm-mode scripted \
  --llm-actions-file ./actions.jsonl

For a concise install + quickstart path (prerequisites, install steps, and the first successful run with validated commands), see docs/quickstart.md.

CLI

mbus run flags (most common):

  • --task or --task-file
  • --plan or --plan-file
  • --config
  • --headless
  • --initial-url
  • --browser-executable, --browser-launch-timeout-ms
  • --browser-no-sandbox, repeated --browser-arg, --browser-keep-user-data-dir
  • --max-steps
  • --llm-mode (stub, scripted, openai)
  • --llm-base-url, --llm-api-key
  • --llm-model-fast, --llm-model-mid, --llm-model-strong
  • --llm-timeout-ms, --llm-temperature, --llm-max-tokens
  • --llm-actions-file
  • --extract-output

mbus bench flags:

  • --tasks-dir (default: harness/tasks)
  • --report-path (default: target/bench/report.json)
  • --config
  • --headless
  • --browser-executable, --browser-launch-timeout-ms
  • --browser-no-sandbox, repeated --browser-arg, --browser-keep-user-data-dir
  • --max-steps-per-task (default: 40)
  • --required-passes (default: total tasks minus two)
  • --llm-mode (scripted, openai)
  • --llm-base-url, --llm-api-key
  • --llm-model-fast, --llm-model-mid, --llm-model-strong
  • --llm-timeout-ms, --llm-temperature, --llm-max-tokens

mbus challenge flags:

  • --tasks-dir (default: harness/challenge)
  • --report-path (default: target/challenge/report.json)
  • --config
  • --headless
  • --browser-executable, --browser-launch-timeout-ms
  • --browser-no-sandbox, repeated --browser-arg, --browser-keep-user-data-dir
  • --max-steps-per-task (default: 40)
  • --required-passes (default: 10)
  • --llm-base-url, --llm-api-key
  • --llm-model-fast, --llm-model-mid, --llm-model-strong
  • --llm-timeout-ms, --llm-temperature, --llm-max-tokens
  • --llm-input-cost-per-million, --llm-output-cost-per-million

mbus package flags:

  • --report-path (required; existing challenge report)
  • --output-dir (default: target/challenge/package/<report-stem>)
  • --zip-path (default: target/challenge/package/<report-stem>.zip)
  • --overwrite

cdp_bootstrap:

  • validates browser startup only using the same browser config inputs as mbus
  • supports --config, --headless, --initial-url, --cdp-url
  • supports --browser-executable, --browser-launch-timeout-ms
  • supports --browser-no-sandbox, repeated --browser-arg, --browser-keep-user-data-dir

Challenge Proof Workflow

The canonical release-proof path is:

MBUS_LLM_API_KEY=... \
MBUS_LLM_INPUT_COST_PER_MILLION=... \
MBUS_LLM_OUTPUT_COST_PER_MILLION=... \
./scripts/run_challenge_proof.sh

The script:

  • Validates required environment variables.
  • Runs mbus challenge against the default 12-task local obstacle suite.
  • Packages the resulting report with mbus package.
  • Prints the exact report, bundle, and zip paths to inspect or share.

For a supplemental adversarial run, point challenge at the separate tasks dir:

MBUS_LLM_API_KEY=... cargo run --bin mbus -- challenge \
  --tasks-dir harness/challenge_adversarial \
  --required-passes 2

See docs/status.md for the current success bar and docs/live-eval-policy.md for what does and does not count as valid evidence.

Regression Harness

Run the local benchmark harness:

cargo run --bin mbus -- bench --llm-mode scripted

The command:

  • Starts a local HTTP harness server on 127.0.0.1 with deterministic pages.
  • Serves static harness pages from harness/pages.
  • Loads task fixtures from harness/tasks/*.json.
  • Executes each task with scripted actions in scripted mode.
  • Executes each task autonomously in openai mode (requires MBUS_LLM_API_KEY or --llm-api-key).
  • Writes the report to target/bench/report.json.
  • Enforces a gate (required_passes, default 8 of 10 tasks).

Use bench to catch regressions in the agent loop and report plumbing. It is not the primary release proof anymore.

Task fixture shape (example):

{
  "id": "bench-task-01",
  "task": "Navigate to benchmark task 01 and confirm marker text.",
  "start_path": "/bench/start",
  "max_steps": 40,
  "actions": [
    {"type": "navigate", "url": "{{base_url}}/bench/task-01"},
    {"type": "done", "summary": "Reached benchmark task 01"}
  ],
  "expect": {
    "status": "done",
    "final_url_contains": "/bench/task-01",
    "final_visible_text_contains": "BENCH TASK 01"
  }
}

Challenge Suite

Run the primary obstacle suite with the OpenAI-compatible path:

MBUS_LLM_API_KEY=... cargo run --bin mbus -- challenge

The command:

  • Loads autonomous challenge manifests from harness/challenge/*.json.
  • Starts the local harness server and serves obstacle pages from harness/pages/challenge.
  • Forces openai mode and persists screenshots to .ralph/runs/... for visual diff follow-up.
  • Writes an aggregate report to target/challenge/report.json.
  • Enforces the autonomous gate at 10 passed tasks out of 12 by default.
  • Uses observable-only success checks (final_url_contains, final_visible_text_contains, screenshots) rather than hidden app knowledge.

Checked-in integration tests exercise this path with a mock OpenAI-compatible server so the CLI/report/package flow stays stable. Real-model challenge quality still needs local proof runs and packaged artifacts.

Package an existing challenge run:

cargo run --bin mbus -- package --report-path target/challenge/report.json

The package command:

  • Validates that the report parses and matches the challenge report shape.
  • Verifies every referenced artifact exists and matches the recorded SHA-256 when present.
  • Copies report.json, README.md, and referenced artifacts into a portable bundle directory.
  • Writes manifest.json with relative file inventory plus gate, usage, and cost summaries.
  • Emits a zip archive next to the unpacked bundle for submission or sharing.

Challenge manifest shape (example):

{
  "id": "challenge-01-cookie-banner",
  "task": "Dismiss the cookie banner so the page clearly shows COOKIE BANNER DISMISSED.",
  "start_url": "{{base_url}}/challenge/cookie-banner.html",
  "allowed_domains": ["127.0.0.1"],
  "max_steps": 20,
  "expect": {
    "final_url_contains": "/challenge/cookie-banner.html",
    "final_visible_text_contains": "COOKIE BANNER DISMISSED",
    "screenshot_artifact_required": true
  }
}

Config

Config precedence is: defaults -> config file -> env (MBUS_*) -> CLI flags. Config file lookup order is: --config, MBUS_CONFIG, ./mbus.toml, ~/.mbus.toml.

Sample mbus.toml:

[agent]
max_steps = 40

[agent.memory]
max_observations = 8
max_history = 100

[browser]
headless = true
# headful = true
initial_url = "about:blank"
snapshot_timeout_ms = 5000
action_timeout_ms = 10000
max_elements = 50
max_text_len = 4000

[router]
failures_to_mid = 2
failures_to_strong = 4
no_progress_to_mid = 2
no_progress_to_strong = 4
ladder = ["gpt-5-mini:medium", "gpt-5.1:medium", "gpt-5.2:medium"]

[validator]
allow_insecure = false
max_text_len = 2000
max_wait_ms = 30000
max_scroll = 2000

[llm]
mode = "stub"
base_url = "https://api.openai.com/v1"
api_key = ""
model_fast = "gpt-5-mini"
model_mid = "gpt-5.1"
model_strong = "gpt-5.2"
timeout_ms = 30000
temperature = 1.0
max_tokens = 256
actions_file = "actions.jsonl"

[output]
extract_output = "mbus_extract.json"

To run with a visible browser window, set headful = true in the config or pass --headless false on the CLI.

Environment variable overrides (full list):

  • MBUS_CONFIG
  • MBUS_MAX_STEPS
  • MBUS_MAX_NO_PROGRESS_STEPS
  • MBUS_MEMORY_MAX_OBSERVATIONS
  • MBUS_MEMORY_MAX_HISTORY
  • MBUS_HEADLESS
  • MBUS_INITIAL_URL
  • MBUS_CDP_URL
  • MBUS_SNAPSHOT_TIMEOUT_MS
  • MBUS_ACTION_TIMEOUT_MS
  • MBUS_MAX_ELEMENTS
  • MBUS_MAX_TEXT_LEN
  • MBUS_ROUTER_FAILURES_TO_MID
  • MBUS_ROUTER_FAILURES_TO_STRONG
  • MBUS_ROUTER_NO_PROGRESS_TO_MID
  • MBUS_ROUTER_NO_PROGRESS_TO_STRONG
  • MBUS_ROUTER_REASONING_EFFORT
  • MBUS_ROUTER_LADDER
  • MBUS_ALLOW_INSECURE
  • MBUS_VALIDATOR_MAX_TEXT_LEN
  • MBUS_VALIDATOR_MAX_WAIT_MS
  • MBUS_VALIDATOR_MAX_SCROLL
  • MBUS_LLM_MODE
  • MBUS_LLM_BASE_URL
  • MBUS_LLM_API_KEY
  • MBUS_LLM_MODEL_FAST
  • MBUS_LLM_MODEL_MID
  • MBUS_LLM_MODEL_STRONG
  • MBUS_LLM_TIMEOUT_MS
  • MBUS_LLM_TEMPERATURE
  • MBUS_LLM_MAX_TOKENS
  • MBUS_LLM_INPUT_COST_PER_MILLION
  • MBUS_LLM_OUTPUT_COST_PER_MILLION
  • MBUS_LLM_ACTIONS_FILE
  • MBUS_EXTRACT_OUTPUT
  • MBUS_SCREENSHOT_ENABLED
  • MBUS_SCREENSHOT_PERSIST

Scripted Actions Format

Scripted actions accept any of the following formats:

  • A JSON array of actions
  • A single JSON action object
  • JSON Lines (one action per line)

Example (actions.jsonl):

{"type":"navigate","url":"https://example.com"}
{"type":"click","id":"el_1"}
{"type":"done","summary":"clicked"}

Logs and Telemetry

  • mbus run prints JSON log lines to stdout (type = config | step | summary).
  • Tracing logs are emitted as JSON to stderr; set RUST_LOG=info or similar to control verbosity.
  • Metrics are in-process counters and timers; see src/telemetry.rs for names.

Troubleshooting

  • Chromium fails to launch: install Chromium/Chrome and ensure it is discoverable by chromiumoxide.
  • OpenAI 401/403: ensure MBUS_LLM_API_KEY is set for openai mode.
  • Invalid scripted actions: confirm the JSON matches the action schema and references real element ids.
  • Timeouts on slow pages: increase snapshot_timeout_ms or action_timeout_ms.
  • Navigation to non-http(s) URLs blocked: set allow_insecure = true only when needed and understand the security implications.

For a structured operations runbook, recovery steps, and the log/metric fields you should monitor, see docs/operations-runbook.md. For the product-level current state and remaining proof work, see docs/status.md.

Runbook

Verification:

  • cargo test
  • Run a short task with mbus run and confirm a summary JSON log line is emitted and, if using extract actions, mbus_extract.json is written.

Rollback:

  • Checkout the previous release tag or commit and rebuild.
  • Revert any config changes (especially router thresholds and timeouts) to the last known-good values.

For the full verification checklist, rollback recipe, and structured logging guidance, see docs/operations-runbook.md.

About

Rust browser + LLM agent for deterministic, single-step web automation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors