Skip to content

Fix: Use correct viewport dimensions in hybrid mode coordinate normalization#1540

Closed
kernel-systems-bot wants to merge 1 commit intobrowserbase:mainfrom
kernel-systems-bot:kernel-systems-bot/viewport-coordinate-normalization
Closed

Fix: Use correct viewport dimensions in hybrid mode coordinate normalization#1540
kernel-systems-bot wants to merge 1 commit intobrowserbase:mainfrom
kernel-systems-bot:kernel-systems-bot/viewport-coordinate-normalization

Conversation

@kernel-systems-bot
Copy link
Copy Markdown
Contributor

@kernel-systems-bot kernel-systems-bot commented Jan 14, 2026

Fix: Hardcoded Viewport in Hybrid Mode Coordinate Normalization

(fix implemented by core team in #1560)

Summary

Fixes coordinate normalization in hybrid mode to use the actual browser viewport dimensions instead of a hardcoded 1288x711 default. Without this fix, clicks in hybrid mode land at wrong coordinates when the viewport differs from the hardcoded default.

Problem

In coordinateNormalization.ts, Google provider coordinate normalization uses a hardcoded viewport:

// BEFORE (bug):
const DEFAULT_VIEWPORT = { width: 1288, height: 711 };

export function normalizeGoogleCoordinates(x: number, y: number) {
  return {
    x: Math.floor((x / 1000) * DEFAULT_VIEWPORT.width),  // Ignores actual viewport
    y: Math.floor((y / 1000) * DEFAULT_VIEWPORT.height),
  };
}

When Google returns coordinates (500, 500) meaning "center of screen":

  • With 1920x1080 viewport:
    • BUG: Code calculates (644, 355) using hardcoded 1288x711
    • CORRECT: Should be (960, 540)
    • Result: Click misses target by 316px horizontal, 185px vertical

Solution

  1. Modified normalizeGoogleCoordinates() and processCoordinates() to accept optional viewport parameter
  2. Updated all hybrid mode tools to get actual viewport via page.evaluate() and pass it to coordinate processing
// AFTER (fix):
export function normalizeGoogleCoordinates(
  x: number,
  y: number,
  viewport?: ViewportSize,  // NEW: accept viewport parameter
) {
  const targetViewport = viewport ?? DEFAULT_VIEWPORT;  // Fall back to default for backward compatibility
  return {
    x: Math.floor((x / 1000) * targetViewport.width),
    y: Math.floor((y / 1000) * targetViewport.height),
  };
}

Impact

Scenario Before Fix After Fix
Custom 1920x1080 viewport, click center Clicks at (644, 355) - WRONG Clicks at (960, 540) - CORRECT
Default 1288x711 viewport Works correctly Works correctly (no change)
Any custom viewport with Google provider Wrong coordinates Correct coordinates

Affected modes:

  • Hybrid mode with Google provider: FIXED
  • CUA mode: Not affected (has its own correct normalization)
  • DOM mode: Not affected (uses element selectors)
  • Non-Google providers: Not affected (return absolute coordinates)

Test Plan

  • Added regression test viewport-coordinate-normalization.test.ts (14 test cases)
  • Verified test FAILS on main (7/14 tests fail - wrong coordinates)
  • Verified test PASSES with fix (14/14 tests pass)
  • Backward compatibility: default viewport behavior unchanged

Files Changed

  • packages/core/lib/v3/agent/utils/coordinateNormalization.ts - Added viewport parameter
  • packages/core/lib/v3/agent/tools/click.ts - Get viewport and pass to processCoordinates
  • packages/core/lib/v3/agent/tools/type.ts - Get viewport and pass to processCoordinates
  • packages/core/lib/v3/agent/tools/clickAndHold.ts - Get viewport and pass to processCoordinates
  • packages/core/lib/v3/agent/tools/dragAndDrop.ts - Get viewport and pass to processCoordinates
  • packages/core/lib/v3/agent/tools/fillFormVision.ts - Get viewport and pass to processCoordinates
  • packages/core/lib/v3/agent/tools/scroll.ts - Get viewport and pass to processCoordinates
  • packages/core/tests/viewport-coordinate-normalization.test.ts - New regression test

Feedback? Email p0@kernel.dev


Summary by cubic

Fixes hybrid-mode coordinate normalization to use the actual browser viewport instead of a hardcoded 1288x711. Actions now hit the correct screen location on any viewport.

  • Bug Fixes
    • Google provider: normalize 0–1000 coords using the provided viewport; fallback to default if missing.
    • Added viewport param to normalizeGoogleCoordinates() and processCoordinates().
    • Tools now pass viewport from window.innerWidth/Height: click, clickAndHold, dragAndDrop, type, fillFormVision, scroll.
    • Added regression test for multiple viewport scenarios.
    • Non-Google providers and other modes are unaffected.

Written for commit 4ae36cc. Summary will update on new commits.

Fixes coordinate normalization in hybrid mode to use the actual browser
viewport dimensions instead of a hardcoded 1288x711 default. Without
this fix, clicks in hybrid mode land at wrong coordinates when the
viewport differs from the default.

- Modified normalizeGoogleCoordinates() and processCoordinates() to
  accept optional viewport parameter
- Updated all hybrid mode tools to get actual viewport via page.evaluate()
- Backward compatible: default viewport still used when not provided

Example: With 1920x1080 viewport, Google returns (500, 500) for center.
Before: Calculated (644, 355) - miss by 316px horizontal, 185px vertical
After: Correctly calculates (960, 540)
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jan 14, 2026

⚠️ No Changeset found

Latest commit: 4ae36cc

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jan 14, 2026

Greptile Summary

Fixed critical coordinate normalization bug in hybrid mode where clicks landed at incorrect positions when using custom viewport sizes with Google providers. The core issue was hardcoded viewport dimensions (1288x711) used for coordinate transformation, causing significant offsets (up to 316px horizontal, 185px vertical) on different viewport sizes.

Changes:

  • Modified normalizeGoogleCoordinates() and processCoordinates() to accept optional viewport parameter with fallback to default for backward compatibility
  • Updated all 6 hybrid mode tools (click, type, clickAndHold, dragAndDrop, fillFormVision, scroll) to fetch actual viewport via page.evaluate() and pass to coordinate processing
  • Added comprehensive regression test suite with 14 test cases covering custom viewports, edge cases, and backward compatibility

Impact:

  • Hybrid mode with Google provider now uses correct coordinates for any viewport size
  • Backward compatibility preserved - default viewport behavior unchanged
  • Non-Google providers unaffected (already return absolute coordinates)
  • DOM mode unaffected (doesn't use coordinate normalization)

Confidence Score: 5/5

  • This PR is safe to merge with no identified risks
  • The fix is surgically precise, well-tested with comprehensive regression tests, maintains backward compatibility, and follows a consistent pattern across all affected files. All coordinate processing call sites have been systematically updated.
  • No files require special attention

Important Files Changed

Filename Overview
packages/core/lib/v3/agent/utils/coordinateNormalization.ts Added optional viewport parameter to coordinate normalization functions with proper fallback to default viewport for backward compatibility
packages/core/lib/v3/agent/tools/click.ts Fetches actual viewport dimensions via page.evaluate() and passes them to processCoordinates() for accurate coordinate normalization
packages/core/tests/viewport-coordinate-normalization.test.ts Comprehensive regression test suite (14 test cases) verifying coordinate normalization with custom viewports and backward compatibility

Sequence Diagram

sequenceDiagram
    participant Agent as AI Agent
    participant Tool as click/type/dragAndDrop Tool
    participant Page as Browser Page
    participant Utils as coordinateNormalization

    Note over Agent,Utils: Before Fix: Used Hardcoded 1288x711
    
    Agent->>Tool: execute({ coordinates: [500, 500] })
    Tool->>Page: evaluate(window.innerWidth, innerHeight)
    Page-->>Tool: { width: 1920, height: 1080 }
    Tool->>Utils: processCoordinates(500, 500, provider, viewport)
    
    alt Google Provider
        Utils->>Utils: normalizeGoogleCoordinates(500, 500, viewport)
        Note over Utils: Uses actual viewport {1920, 1080}<br/>instead of hardcoded {1288, 711}
        Utils->>Utils: x = floor((500/1000) * 1920) = 960
        Utils->>Utils: y = floor((500/1000) * 1080) = 540
        Utils-->>Tool: { x: 960, y: 540 } ✓ CORRECT
    else Non-Google Provider
        Utils-->>Tool: { x: 500, y: 500 } (no transform)
    end
    
    Tool->>Page: click(960, 540)
    Note over Page: Clicks at center of 1920x1080 viewport<br/>Previously: clicked at (644, 355) - WRONG
Loading

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 8 files

@kernel-systems-bot kernel-systems-bot changed the title Fix: Use actual viewport dimensions in hybrid mode coordinate normalization Fix: Use correct viewport dimensions in hybrid mode coordinate normalization Jan 26, 2026
@kernel-systems-bot
Copy link
Copy Markdown
Contributor Author

implementation in #1560

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant