Pin competitor smoke provenance before live runs by shaun0927 · Pull Request #1339 · shaun0927/openchrome

shaun0927 · 2026-05-17T16:02:38Z

Summary

strengthens bench:competitor-smoke to capture concrete competitor versions and version sources
records dependency availability plus explicit skip categories (not_requested, dependency_missing, etc.) per row
updates benchmark/COMPETITORS.md from placeholder pins to the current smoke/runtime registry pins
refreshes competitor-smoke.json and readiness artifacts; stale artifact count drops because the smoke artifact now pins OpenChrome 1.12.4

Stack

Base: Protect benchmark claims before expanding measurements #1338 (benchmark/pr1-contract-hardening)
This is PR2 of the validated benchmark plan.

Scope boundaries

In scope:

competitor smoke/version provenance
dependency detection and visible diagnostic skips
smoke result artifact refresh

Out of scope:

full --include-live Chrome/CDP execution
native browser-use/playwright-mcp LLM loops
headline comparisons
OpenChrome product/core changes

Verification

npm run bench:competitor-smoke
npm test -- --runTestsByPath tests/benchmark/run-competitor-smoke.test.ts tests/benchmark/adapters/browser-use-adapter.test.ts tests/benchmark/adapters/playwright-mcp-adapter.test.ts tests/benchmark/benchmark-readiness.test.ts --runInBand
npm run bench:readiness
npm run build
git diff --check

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1ad1b79c2a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-17T16:07:54Z

+    expect(playwright?.skipCategory).toBe('not_requested');
+    expect(playwright?.version).toMatch(/^[0-9]+\.[0-9]+\.[0-9]+/);
+    expect(playwright?.versionPinned).toBe(true);
+    expect(rows.find((row) => row.library === 'OpenChrome')?.version).toBe('1.12.4');


Read OpenChrome version dynamically in smoke test

This assertion hard-codes 1.12.4, so the test will fail on every package version bump even when the smoke matrix behavior is correct. That creates avoidable CI breakage and recurring maintenance churn unrelated to regressions; the expected value should come from package.json (or readRepoVersion()) instead of a fixed literal.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-17T16:07:54Z


+const REGISTRY_PINNED_VERSIONS: Readonly<Record<string, string>> = {
+  OpenChrome: readRepoVersion(),
+  Playwright: '1.49.0',


Update Playwright fallback pin to current registry version

The fallback registry map still pins Playwright to 1.49.0, but this change set updates the benchmark registry/docs to 1.60.0. When Playwright is not resolvable in the runtime, the smoke output will emit the stale version while versionPinned remains true, producing incorrect provenance in diagnostic artifacts.

Useful? React with 👍 / 👎.

Make the competitor smoke matrix report concrete runtime versions, dependency availability, and explicit skip categories so later benchmark axes can distinguish unavailable competitors from measured failures. Constraint: Competitor support must remain benchmark-harness only and must invoke real external packages or surface skips. Rejected: Keeping operator-pinned-runtime placeholders in smoke results | it hides whether the local benchmark runtime can actually resolve each competitor. Confidence: high Scope-risk: narrow Directive: Treat not-requested and dependency-missing competitor rows as visible diagnostic skips, never as zero-score measured rows. Tested: npm run bench:competitor-smoke Tested: npm test -- --runTestsByPath tests/benchmark/run-competitor-smoke.test.ts tests/benchmark/adapters/browser-use-adapter.test.ts tests/benchmark/adapters/playwright-mcp-adapter.test.ts tests/benchmark/benchmark-readiness.test.ts --runInBand Tested: npm run bench:readiness Tested: npm run build Tested: git diff --check Not-tested: --include-live browser/CDP execution; browser-use installed Python environment; paid LLM paths Co-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 343ca15d39

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T03:24:09Z


+const REGISTRY_PINNED_VERSIONS: Readonly<Record<string, string>> = {
+  OpenChrome: readRepoVersion(),
+  Playwright: '1.49.0',


Align Playwright fallback pin with benchmark registry

When playwright cannot be resolved, the smoke runner falls back to this hard-coded version, but this commit updates the benchmark registry/docs to 1.60.0. In that dependency-missing path, generated artifacts will report an outdated Playwright version while still marking versionPinned: true, which makes provenance data inaccurate and can mislead readiness/debugging decisions.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-18T03:24:09Z

+    expect(playwright?.skipCategory).toBe('not_requested');
+    expect(playwright?.version).toMatch(/^[0-9]+\.[0-9]+\.[0-9]+/);
+    expect(playwright?.versionPinned).toBe(true);
+    expect(rows.find((row) => row.library === 'OpenChrome')?.version).toBe('1.12.4');


Derive OpenChrome smoke assertion from package version

This assertion hard-codes 1.12.4, so a normal package version bump will fail the smoke test even when the matrix behavior is correct. That creates avoidable CI churn unrelated to functional regressions; the expected value should come from the repo version source used by the runner.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

shaun0927 mentioned this pull request May 17, 2026

Measure no-secret benchmark axes before live promotion #1340

Merged

shaun0927 force-pushed the benchmark/pr2-competitor-smoke-pins branch from 1ad1b79 to 343ca15 Compare May 18, 2026 03:20

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

shaun0927 changed the base branch from benchmark/pr1-contract-hardening to main May 18, 2026 03:28

shaun0927 merged commit 6963ed4 into main May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pin competitor smoke provenance before live runs#1339

Pin competitor smoke provenance before live runs#1339
shaun0927 merged 1 commit into
mainfrom
benchmark/pr2-competitor-smoke-pins

shaun0927 commented May 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shaun0927 commented May 17, 2026

Summary

Stack

Scope boundaries

Verification

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant