bayestest is an agent-friendly Python package and CLI for A/B/n decisions.
Early release (alpha): this project is work in progress and not production-hardened yet.
It supports:
- Bayesian conversion-rate decisions (Beta-Binomial)
- Bayesian ARPU probability-to-win from aggregate revenue stats
- Frequentist sequential decisions (O'Brien-Fleming alpha spending)
- Multi-variant (
A/B/n) comparisons against one control - Guardrail checks (latency, bounce rate, error rate, etc.)
- SRM detection (sample ratio mismatch) for data quality
- Structured JSON output for agents and automations
- Markdown report generation for human review
- CSV/XLSX ingestion with mapping files
source $HOME/.local/bin/env
uv venv .venv
uv pip install -e .- Python
>=3.9 - Dependencies:
numpy>=1.22,openpyxl>=3.1.0 - CLI installed (
bayestest --helpworks) - Exactly one control variant in analysis inputs
- Required variant fields:
name,visitors,conversions - ARPU mode also requires:
revenue_sum,revenue_sum_squares - CSV/XLSX mode requires mapping JSON (
example-mappingorexample-duration-mapping)
Generate an input template:
bayestest example-input > input.jsonRun analysis and write both JSON + report:
bayestest analyze \
--input input.json \
--output output.json \
--report report.mdAnalyze directly from CSV/XLSX:
bayestest analyze-file \
--input experiment.xlsx \
--mapping mapping.json \
--sheet Sheet1 \
--output output.json \
--report report.mdRun all bundled demos:
make demoCheck environment readiness:
bayestest doctor
bayestest doctor --json
bayestest doctor --strictEstimate duration from assumptions:
bayestest duration \
--method frequentist \
--baseline-rate 0.04 \
--relative-mde 0.05 \
--daily-traffic 50000 \
--n-variants 3 \
--max-looks 10Estimate Bayesian duration (assurance simulation):
bayestest duration \
--method bayesian \
--baseline-rate 0.04 \
--relative-mde 0.05 \
--daily-traffic 50000 \
--n-variants 3 \
--max-days 60Analyze pasted variant text:
bayestest analyze-text \
--text \"Variant A: 100 conversions out of 2000 visitors\nVariant B: 125 conversions out of 2000 visitors\" \
--experiment-name pasted_exampleEstimate duration from CSV/XLSX:
bayestest example-duration-mapping > duration_mapping.json
bayestest duration \
--input duration_inputs.xlsx \
--mapping duration_mapping.json \
--sheet Sheet1- Detect available columns in source data (CSV/XLSX).
- Build
mapping.jsonto map client fields intobayestestfields. - Run
bayestest analyze-file .... - Read
recommendation.action,decision_confidence, andrisk_flags. - If
action=continue_collecting_data, schedule next look. - If
action=investigate_data_quality, resolve SRM/tracking before any ship decision. - For planning questions ("how long should we run?"), run
bayestest duration. - For pasted stats messages, run
bayestest analyze-textto convert free text into analysis. - For spreadsheet planning assumptions, run
bayestest duration --input ... --mapping ....
Top-level fields:
experiment_name(str)method("bayesian"or"frequentist_sequential")primary_metric("conversion_rate"or"arpu")alpha(float, default0.05)look_indexandmax_looks(ints, sequential mode)information_fraction(optional float in(0, 1], overrides look/max_looks)variants(list): exactly one row must include"is_control": trueguardrails(optional list)decision_thresholds(optional):bayes_prob_beats_control(default0.95)max_expected_loss(default0.001)
samples(default50000, Bayesian mode)random_seed(default7)
Variant row:
{
"name": "control",
"visitors": 100000,
"conversions": 4000,
"is_control": true
}For primary_metric: "arpu", each variant also needs:
revenue_sumrevenue_sum_squares
Generate mapping template:
bayestest example-mapping > mapping.jsonMapping keys:
columns.variant,columns.visitors,columns.conversions- optional
columns.is_control - optional
columns.revenue_sum,columns.revenue_sum_squares - control detection fallback:
control.column+control.value
This lets agents reshape arbitrary business exports into a consistent contract.
- Bayesian conversion-rate A/B/n:
{
"experiment_name": "homepage_cta",
"method": "bayesian",
"primary_metric": "conversion_rate",
"variants": [
{"name": "control", "visitors": 50000, "conversions": 2000, "is_control": true},
{"name": "v1", "visitors": 50000, "conversions": 2080, "is_control": false},
{"name": "v2", "visitors": 50000, "conversions": 2140, "is_control": false}
]
}- Bayesian ARPU probability-to-win:
{
"experiment_name": "pricing_page",
"method": "bayesian",
"primary_metric": "arpu",
"variants": [
{"name": "control", "visitors": 10000, "conversions": 550, "revenue_sum": 22000, "revenue_sum_squares": 150000, "is_control": true},
{"name": "v1", "visitors": 10000, "conversions": 570, "revenue_sum": 23500, "revenue_sum_squares": 170000, "is_control": false}
]
}- Sequential ARPU (early look):
{
"experiment_name": "checkout_flow",
"method": "frequentist_sequential",
"primary_metric": "arpu",
"alpha": 0.05,
"look_index": 3,
"max_looks": 10,
"variants": [
{"name": "control", "visitors": 12000, "conversions": 610, "revenue_sum": 21000, "revenue_sum_squares": 150000, "is_control": true},
{"name": "v1", "visitors": 12000, "conversions": 640, "revenue_sum": 22400, "revenue_sum_squares": 167000, "is_control": false}
]
}Common errors and fixes:
Exactly one variant must have is_control=true: mark one and only one control row.conversions cannot exceed visitors: fix aggregation query or mapped columns.ARPU requires revenue_sum and revenue_sum_squares: include both revenue aggregate columns.primary_metric must be 'conversion_rate' or 'arpu': fix mapping or input payload values.
recommendation contains:
action(ship_*,continue_collecting_data,do_not_ship,investigate_data_quality,stop_and_rollback)rationaledecision_confidence(0 to 1)next_best_actionrisk_flags(e.g.srm_detected,guardrail_failure)
- This tool currently uses aggregate statistics for ARPU.
- For production, add metric-specific robust models, stronger QA checks, and regression tests.