Skip to content

Add optional Playwright Chrome correctness driver#112

Draft
chenglou wants to merge 1 commit intomainfrom
codex/chrome-playwright-correctness
Draft

Add optional Playwright Chrome correctness driver#112
chenglou wants to merge 1 commit intomainfrom
codex/chrome-playwright-correctness

Conversation

@chenglou
Copy link
Copy Markdown
Owner

@chenglou chenglou commented Apr 7, 2026

Summary

  • add an optional Chrome-only Playwright correctness driver behind CHROME_AUTOMATION_DRIVER=playwright
  • pin headless Chrome to the validated screen environment instead of silently trusting default headless layout
  • keep benchmark runs on the existing foreground AppleScript path and fail loudly if someone tries to use the Playwright path there

Rationale

We now have a mechanically checked-in Chrome correctness baseline in corpora/chrome-step10.json, so we can evaluate a new browser driver by generating the same machine-readable sweep and diffing it directly.

That made Chrome a good candidate for trying Playwright:

  • Chrome correctness currently goes through AppleScript, which causes focus blips and is less pleasant for day-to-day work
  • Firefox was already on a protocol path, so it did not have the same upside
  • Safari still needs to stay real Safari, so Playwright WebKit is not a drop-in replacement there

The important constraint is that this is only for correctness, not benchmarks.

During the investigation:

  • plain headless Chrome diverged from the checked-in corpus status
  • headed Playwright Chrome matched corpus status, but benchmark numbers came back systematically slower than the current benchmark snapshot
  • pinned headless Chrome with a validated screen environment matched corpus status exactly

So this PR takes the conservative split:

  • Chrome correctness: optional Playwright path
  • Chrome benchmarks: unchanged AppleScript foreground path
  • no silent fallback between the two

Implementation notes

  • the Playwright path is only selected when CHROME_AUTOMATION_DRIVER=playwright
  • it is Chrome-only and non-foreground only
  • it uses headless Chrome with --screen-info={3024x1964 devicePixelRatio=2}
  • it asserts the pinned environment from inside the page before trusting the run
  • benchmark callers still pass foreground: true, so the Playwright path errors immediately there

Verification

  • bun run check
  • CHROME_AUTOMATION_DRIVER=playwright bun run accuracy-check7680/7680
  • CHROME_AUTOMATION_DRIVER=playwright bun run corpus-sweep --all --start=300 --end=900 --step=10 --output=/tmp/pretext-playwright-step10-pr.json
  • mechanical diff of /tmp/pretext-playwright-step10-pr.json against corpora/chrome-step10.json0 diffs
  • CHROME_AUTOMATION_DRIVER=playwright bun run benchmark-check fails loudly with the expected guardrail message

Follow-up questions

  • whether to add convenience scripts for the Playwright correctness path
  • whether this should stay opt-in indefinitely or become the default Chrome correctness driver later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant