Route native competitors without OpenChrome fallback by shaun0927 · Pull Request #1343 · shaun0927/openchrome

shaun0927 · 2026-05-17T16:27:11Z

Summary

marks playwright-mcp and browser-use as explicit external native execution targets with setup requirements
updates dry-run cost projection to include selected native competitors without making API calls
records no-fallback routing metadata so external competitors cannot silently reuse OpenChrome behavior
regenerates competitor smoke and readiness artifacts with current version-pinned rows

PR scope validation

In scope for PR6:

Benchmark #B follow-up: native/passive competitor adapter matrix #1302 native competitor routing for playwright-mcp and browser-use
Benchmark #B: Agent Task Success — WebVoyager at equal LLM and equal budget #1257 native-vs-passive separation metadata and dry-run projections
dependency-only setup/skip visibility and exact version capture

Out of scope:

reimplementing external competitor behavior
OpenChrome product/core changes
fault injection
full orchestration or paid live provider calls

Verification

npm run bench:competitor-smoke
npm run bench:webvoyager:real -- --library playwright-mcp --mode native --dry-run
npm run bench:webvoyager:real -- --library browser-use --mode native --dry-run
npm test -- --runTestsByPath tests/benchmark/adapters/playwright-mcp-adapter.test.ts tests/benchmark/adapters/browser-use-adapter.test.ts tests/benchmark/webvoyager/llm/library-routing.test.ts tests/benchmark/benchmark-readiness.test.ts --runInBand
npm run bench:readiness
npm run build
git diff --check

Notes

Stacked on #1342 / benchmark/pr5-real-llm-repetitions.

Co-authored-by: OmX omx@oh-my-codex.dev

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ffbe627fc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-17T16:29:45Z

    competitorPin: 'playwright-mcp (native MCP)',
-    nativeLoopWired: false,
-    note: 'playwright-mcp MCP server driven by the same Claude tool-calling loop. Loop wiring lands next session.',
+    nativeLoopWired: true,


Keep competitor nativeLoopWired false until dispatch exists

Marking playwright-mcp as nativeLoopWired: true removes the live-run guard in main() (tests/benchmark/webvoyager/runner.ts) even though execution is still not routed by library: runTask() branches only on adapter and never consumes opts.library/nativeExecution. In this state, --library playwright-mcp (and similarly browser-use) can proceed through the OpenChrome/LLM adapter path, producing mislabeled competitor rows instead of true external-native execution and violating the no-fallback contract.

Useful? React with 👍 / 👎.

Mark playwright-mcp and browser-use as explicit external native execution targets with setup requirements, dry-run cost projections, version-pinned smoke rows, and no-fallback routing metadata.\n\nConstraint: External competitors must use their own MCP/Python bridge surfaces and must not silently reuse OpenChrome behavior.\nRejected: Keep native competitors marked unwired | That blocks dry-run planning and hides dependency-only setup errors from the benchmark matrix.\nConfidence: high\nScope-risk: narrow\nDirective: Passive rows remain secondary; future live rows must keep nativeExecution and forbidsOpenChromeFallback metadata intact.\nTested: npm run bench:competitor-smoke; npm run bench:webvoyager:real -- --library playwright-mcp --mode native --dry-run; npm run bench:webvoyager:real -- --library browser-use --mode native --dry-run; npm test -- --runTestsByPath tests/benchmark/adapters/playwright-mcp-adapter.test.ts tests/benchmark/adapters/browser-use-adapter.test.ts tests/benchmark/webvoyager/llm/library-routing.test.ts tests/benchmark/benchmark-readiness.test.ts --runInBand; npm run bench:readiness; npm run build; git diff --check\nNot-tested: Operator-run live native rows with provider credentials.\n\nCo-authored-by: OmX <omx@oh-my-codex.dev>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf9a0ee960

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T03:23:52Z

    competitorPin: 'playwright-mcp (native MCP)',
-    nativeLoopWired: false,
-    note: 'playwright-mcp MCP server driven by the same Claude tool-calling loop. Loop wiring lands next session.',
+    nativeLoopWired: true,


Keep nativeLoopWired false until library dispatch exists

Setting nativeLoopWired to true for playwright-mcp lets non-dry runs pass the live library gate, but execution is still chosen only by adapter in runTask and never by opts.library/nativeExecution (tests/benchmark/webvoyager/runner.ts), so --library playwright-mcp can produce mislabeled rows without actually running the external competitor path. This violates the no-fallback guarantee the commit claims to enforce and can corrupt benchmark comparisons.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-18T03:23:52Z

+    status: 'partial',
+    measurementReadiness: 'diagnostic_or_smoke_only',


Keep issue #1302 not_ready until native execution is real

This change marks issue #1302 as partial/diagnostic_or_smoke_only, but the runner still does not route by library (it dispatches only by adapter in tests/benchmark/webvoyager/runner.ts), so the underlying capability is not actually implemented yet. Updating readiness here inflates the summary counts and can mislead release/readiness decisions that rely on this report.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

shaun0927 mentioned this pull request May 17, 2026

Stress real-world tasks through final postconditions #1344

Merged

shaun0927 force-pushed the benchmark/pr5-real-llm-repetitions branch from 542ab43 to d79103e Compare May 18, 2026 03:20

shaun0927 force-pushed the benchmark/pr6-native-competitor-execution branch from 5ffbe62 to cf9a0ee Compare May 18, 2026 03:20

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

shaun0927 changed the base branch from benchmark/pr5-real-llm-repetitions to main May 18, 2026 03:28

shaun0927 merged commit a9b733c into main May 18, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Route native competitors without OpenChrome fallback#1343

Route native competitors without OpenChrome fallback#1343
shaun0927 merged 1 commit into
mainfrom
benchmark/pr6-native-competitor-execution

shaun0927 commented May 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		status: 'partial',
		measurementReadiness: 'diagnostic_or_smoke_only',

Conversation

shaun0927 commented May 17, 2026

Summary

PR scope validation

Verification

Notes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant