Rust browser + LLM agent for deterministic, single-step web automation.
mbus runs a tight loop of snapshot -> propose -> validate -> apply. Actions are strictly validated against the current observation before execution, and every step is logged as JSON for traceability.
Key traits:
- Chromium CDP browser adapter (
chromiumoxide) - Strict action schema + validation
- Model router with fast -> mid -> strong escalation
- Structured JSON logs plus tracing + metrics
- Primary success bar: the 12-task local obstacle suite run through
mbus challenge. - Secondary regression bar: the 10-task local
mbus benchharness. - Checked-in tests validate challenge/bench/package plumbing with a mock OpenAI-compatible server; real-model evidence is generated locally and packaged, not committed.
- Current status, open gaps, and proof expectations live in
docs/status.md. Live-site policy lives indocs/live-eval-policy.md.
Prerequisites:
- Rust toolchain (stable)
- A Chromium/Chrome binary discoverable by
chromiumoxide
Build:
cargo buildDefault (stub LLM, immediately returns done after snapshot):
cargo run -- run --task "open example.com"OpenAI mode:
MBUS_LLM_MODE=openai MBUS_LLM_API_KEY=... \
cargo run -- run --task "Find the shipping address" \
--llm-model-fast gpt-5-mini \
--llm-model-mid gpt-5.1 \
--llm-model-strong gpt-5.2Scripted mode (feed actions from a file):
cargo run -- run --task "Click the button" \
--llm-mode scripted \
--llm-actions-file ./actions.jsonlFor a concise install + quickstart path (prerequisites, install steps, and the first successful run with validated commands), see docs/quickstart.md.
mbus run flags (most common):
--taskor--task-file--planor--plan-file--config--headless--initial-url--browser-executable,--browser-launch-timeout-ms--browser-no-sandbox, repeated--browser-arg,--browser-keep-user-data-dir--max-steps--llm-mode(stub,scripted,openai)--llm-base-url,--llm-api-key--llm-model-fast,--llm-model-mid,--llm-model-strong--llm-timeout-ms,--llm-temperature,--llm-max-tokens--llm-actions-file--extract-output
mbus bench flags:
--tasks-dir(default:harness/tasks)--report-path(default:target/bench/report.json)--config--headless--browser-executable,--browser-launch-timeout-ms--browser-no-sandbox, repeated--browser-arg,--browser-keep-user-data-dir--max-steps-per-task(default:40)--required-passes(default: total tasks minus two)--llm-mode(scripted,openai)--llm-base-url,--llm-api-key--llm-model-fast,--llm-model-mid,--llm-model-strong--llm-timeout-ms,--llm-temperature,--llm-max-tokens
mbus challenge flags:
--tasks-dir(default:harness/challenge)--report-path(default:target/challenge/report.json)--config--headless--browser-executable,--browser-launch-timeout-ms--browser-no-sandbox, repeated--browser-arg,--browser-keep-user-data-dir--max-steps-per-task(default:40)--required-passes(default:10)--llm-base-url,--llm-api-key--llm-model-fast,--llm-model-mid,--llm-model-strong--llm-timeout-ms,--llm-temperature,--llm-max-tokens--llm-input-cost-per-million,--llm-output-cost-per-million
mbus package flags:
--report-path(required; existingchallengereport)--output-dir(default:target/challenge/package/<report-stem>)--zip-path(default:target/challenge/package/<report-stem>.zip)--overwrite
cdp_bootstrap:
- validates browser startup only using the same browser config inputs as
mbus - supports
--config,--headless,--initial-url,--cdp-url - supports
--browser-executable,--browser-launch-timeout-ms - supports
--browser-no-sandbox, repeated--browser-arg,--browser-keep-user-data-dir
The canonical release-proof path is:
MBUS_LLM_API_KEY=... \
MBUS_LLM_INPUT_COST_PER_MILLION=... \
MBUS_LLM_OUTPUT_COST_PER_MILLION=... \
./scripts/run_challenge_proof.shThe script:
- Validates required environment variables.
- Runs
mbus challengeagainst the default 12-task local obstacle suite. - Packages the resulting report with
mbus package. - Prints the exact report, bundle, and zip paths to inspect or share.
For a supplemental adversarial run, point challenge at the separate tasks dir:
MBUS_LLM_API_KEY=... cargo run --bin mbus -- challenge \
--tasks-dir harness/challenge_adversarial \
--required-passes 2See docs/status.md for the current success bar and docs/live-eval-policy.md for what does and does not count as valid evidence.
Run the local benchmark harness:
cargo run --bin mbus -- bench --llm-mode scriptedThe command:
- Starts a local HTTP harness server on
127.0.0.1with deterministic pages. - Serves static harness pages from
harness/pages. - Loads task fixtures from
harness/tasks/*.json. - Executes each task with scripted actions in
scriptedmode. - Executes each task autonomously in
openaimode (requiresMBUS_LLM_API_KEYor--llm-api-key). - Writes the report to
target/bench/report.json. - Enforces a gate (
required_passes, default 8 of 10 tasks).
Use bench to catch regressions in the agent loop and report plumbing. It is not the primary release proof anymore.
Task fixture shape (example):
{
"id": "bench-task-01",
"task": "Navigate to benchmark task 01 and confirm marker text.",
"start_path": "/bench/start",
"max_steps": 40,
"actions": [
{"type": "navigate", "url": "{{base_url}}/bench/task-01"},
{"type": "done", "summary": "Reached benchmark task 01"}
],
"expect": {
"status": "done",
"final_url_contains": "/bench/task-01",
"final_visible_text_contains": "BENCH TASK 01"
}
}Run the primary obstacle suite with the OpenAI-compatible path:
MBUS_LLM_API_KEY=... cargo run --bin mbus -- challengeThe command:
- Loads autonomous challenge manifests from
harness/challenge/*.json. - Starts the local harness server and serves obstacle pages from
harness/pages/challenge. - Forces
openaimode and persists screenshots to.ralph/runs/...for visual diff follow-up. - Writes an aggregate report to
target/challenge/report.json. - Enforces the autonomous gate at 10 passed tasks out of 12 by default.
- Uses observable-only success checks (
final_url_contains,final_visible_text_contains, screenshots) rather than hidden app knowledge.
Checked-in integration tests exercise this path with a mock OpenAI-compatible server so the CLI/report/package flow stays stable. Real-model challenge quality still needs local proof runs and packaged artifacts.
Package an existing challenge run:
cargo run --bin mbus -- package --report-path target/challenge/report.jsonThe package command:
- Validates that the report parses and matches the challenge report shape.
- Verifies every referenced artifact exists and matches the recorded SHA-256 when present.
- Copies
report.json,README.md, and referenced artifacts into a portable bundle directory. - Writes
manifest.jsonwith relative file inventory plus gate, usage, and cost summaries. - Emits a zip archive next to the unpacked bundle for submission or sharing.
Challenge manifest shape (example):
{
"id": "challenge-01-cookie-banner",
"task": "Dismiss the cookie banner so the page clearly shows COOKIE BANNER DISMISSED.",
"start_url": "{{base_url}}/challenge/cookie-banner.html",
"allowed_domains": ["127.0.0.1"],
"max_steps": 20,
"expect": {
"final_url_contains": "/challenge/cookie-banner.html",
"final_visible_text_contains": "COOKIE BANNER DISMISSED",
"screenshot_artifact_required": true
}
}Config precedence is: defaults -> config file -> env (MBUS_*) -> CLI flags.
Config file lookup order is: --config, MBUS_CONFIG, ./mbus.toml, ~/.mbus.toml.
Sample mbus.toml:
[agent]
max_steps = 40
[agent.memory]
max_observations = 8
max_history = 100
[browser]
headless = true
# headful = true
initial_url = "about:blank"
snapshot_timeout_ms = 5000
action_timeout_ms = 10000
max_elements = 50
max_text_len = 4000
[router]
failures_to_mid = 2
failures_to_strong = 4
no_progress_to_mid = 2
no_progress_to_strong = 4
ladder = ["gpt-5-mini:medium", "gpt-5.1:medium", "gpt-5.2:medium"]
[validator]
allow_insecure = false
max_text_len = 2000
max_wait_ms = 30000
max_scroll = 2000
[llm]
mode = "stub"
base_url = "https://api.openai.com/v1"
api_key = ""
model_fast = "gpt-5-mini"
model_mid = "gpt-5.1"
model_strong = "gpt-5.2"
timeout_ms = 30000
temperature = 1.0
max_tokens = 256
actions_file = "actions.jsonl"
[output]
extract_output = "mbus_extract.json"To run with a visible browser window, set headful = true in the config or
pass --headless false on the CLI.
Environment variable overrides (full list):
MBUS_CONFIGMBUS_MAX_STEPSMBUS_MAX_NO_PROGRESS_STEPSMBUS_MEMORY_MAX_OBSERVATIONSMBUS_MEMORY_MAX_HISTORYMBUS_HEADLESSMBUS_INITIAL_URLMBUS_CDP_URLMBUS_SNAPSHOT_TIMEOUT_MSMBUS_ACTION_TIMEOUT_MSMBUS_MAX_ELEMENTSMBUS_MAX_TEXT_LENMBUS_ROUTER_FAILURES_TO_MIDMBUS_ROUTER_FAILURES_TO_STRONGMBUS_ROUTER_NO_PROGRESS_TO_MIDMBUS_ROUTER_NO_PROGRESS_TO_STRONGMBUS_ROUTER_REASONING_EFFORTMBUS_ROUTER_LADDERMBUS_ALLOW_INSECUREMBUS_VALIDATOR_MAX_TEXT_LENMBUS_VALIDATOR_MAX_WAIT_MSMBUS_VALIDATOR_MAX_SCROLLMBUS_LLM_MODEMBUS_LLM_BASE_URLMBUS_LLM_API_KEYMBUS_LLM_MODEL_FASTMBUS_LLM_MODEL_MIDMBUS_LLM_MODEL_STRONGMBUS_LLM_TIMEOUT_MSMBUS_LLM_TEMPERATUREMBUS_LLM_MAX_TOKENSMBUS_LLM_INPUT_COST_PER_MILLIONMBUS_LLM_OUTPUT_COST_PER_MILLIONMBUS_LLM_ACTIONS_FILEMBUS_EXTRACT_OUTPUTMBUS_SCREENSHOT_ENABLEDMBUS_SCREENSHOT_PERSIST
Scripted actions accept any of the following formats:
- A JSON array of actions
- A single JSON action object
- JSON Lines (one action per line)
Example (actions.jsonl):
{"type":"navigate","url":"https://example.com"}
{"type":"click","id":"el_1"}
{"type":"done","summary":"clicked"}mbus runprints JSON log lines to stdout (type = config | step | summary).- Tracing logs are emitted as JSON to stderr; set
RUST_LOG=infoor similar to control verbosity. - Metrics are in-process counters and timers; see
src/telemetry.rsfor names.
- Chromium fails to launch: install Chromium/Chrome and ensure it is
discoverable by
chromiumoxide. - OpenAI 401/403: ensure
MBUS_LLM_API_KEYis set foropenaimode. - Invalid scripted actions: confirm the JSON matches the action schema and references real element ids.
- Timeouts on slow pages: increase
snapshot_timeout_msoraction_timeout_ms. - Navigation to non-http(s) URLs blocked: set
allow_insecure = trueonly when needed and understand the security implications.
For a structured operations runbook, recovery steps, and the log/metric fields
you should monitor, see docs/operations-runbook.md. For the product-level
current state and remaining proof work, see docs/status.md.
Verification:
cargo test- Run a short task with
mbus runand confirm asummaryJSON log line is emitted and, if using extract actions,mbus_extract.jsonis written.
Rollback:
- Checkout the previous release tag or commit and rebuild.
- Revert any config changes (especially router thresholds and timeouts) to the last known-good values.
For the full verification checklist, rollback recipe, and structured logging
guidance, see docs/operations-runbook.md.