Fix: Use correct viewport dimensions in hybrid mode coordinate normalization#1540
Conversation
Fixes coordinate normalization in hybrid mode to use the actual browser viewport dimensions instead of a hardcoded 1288x711 default. Without this fix, clicks in hybrid mode land at wrong coordinates when the viewport differs from the default. - Modified normalizeGoogleCoordinates() and processCoordinates() to accept optional viewport parameter - Updated all hybrid mode tools to get actual viewport via page.evaluate() - Backward compatible: default viewport still used when not provided Example: With 1920x1080 viewport, Google returns (500, 500) for center. Before: Calculated (644, 355) - miss by 316px horizontal, 185px vertical After: Correctly calculates (960, 540)
|
Greptile SummaryFixed critical coordinate normalization bug in hybrid mode where clicks landed at incorrect positions when using custom viewport sizes with Google providers. The core issue was hardcoded viewport dimensions (1288x711) used for coordinate transformation, causing significant offsets (up to 316px horizontal, 185px vertical) on different viewport sizes. Changes:
Impact:
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Agent as AI Agent
participant Tool as click/type/dragAndDrop Tool
participant Page as Browser Page
participant Utils as coordinateNormalization
Note over Agent,Utils: Before Fix: Used Hardcoded 1288x711
Agent->>Tool: execute({ coordinates: [500, 500] })
Tool->>Page: evaluate(window.innerWidth, innerHeight)
Page-->>Tool: { width: 1920, height: 1080 }
Tool->>Utils: processCoordinates(500, 500, provider, viewport)
alt Google Provider
Utils->>Utils: normalizeGoogleCoordinates(500, 500, viewport)
Note over Utils: Uses actual viewport {1920, 1080}<br/>instead of hardcoded {1288, 711}
Utils->>Utils: x = floor((500/1000) * 1920) = 960
Utils->>Utils: y = floor((500/1000) * 1080) = 540
Utils-->>Tool: { x: 960, y: 540 } ✓ CORRECT
else Non-Google Provider
Utils-->>Tool: { x: 500, y: 500 } (no transform)
end
Tool->>Page: click(960, 540)
Note over Page: Clicks at center of 1920x1080 viewport<br/>Previously: clicked at (644, 355) - WRONG
|
|
implementation in #1560 |
Fix: Hardcoded Viewport in Hybrid Mode Coordinate Normalization
(fix implemented by core team in #1560)
Summary
Fixes coordinate normalization in hybrid mode to use the actual browser viewport dimensions instead of a hardcoded 1288x711 default. Without this fix, clicks in hybrid mode land at wrong coordinates when the viewport differs from the hardcoded default.
Problem
In
coordinateNormalization.ts, Google provider coordinate normalization uses a hardcoded viewport:When Google returns coordinates (500, 500) meaning "center of screen":
Solution
normalizeGoogleCoordinates()andprocessCoordinates()to accept optional viewport parameterpage.evaluate()and pass it to coordinate processingImpact
Affected modes:
Test Plan
viewport-coordinate-normalization.test.ts(14 test cases)Files Changed
packages/core/lib/v3/agent/utils/coordinateNormalization.ts- Added viewport parameterpackages/core/lib/v3/agent/tools/click.ts- Get viewport and pass to processCoordinatespackages/core/lib/v3/agent/tools/type.ts- Get viewport and pass to processCoordinatespackages/core/lib/v3/agent/tools/clickAndHold.ts- Get viewport and pass to processCoordinatespackages/core/lib/v3/agent/tools/dragAndDrop.ts- Get viewport and pass to processCoordinatespackages/core/lib/v3/agent/tools/fillFormVision.ts- Get viewport and pass to processCoordinatespackages/core/lib/v3/agent/tools/scroll.ts- Get viewport and pass to processCoordinatespackages/core/tests/viewport-coordinate-normalization.test.ts- New regression testFeedback? Email p0@kernel.dev
Summary by cubic
Fixes hybrid-mode coordinate normalization to use the actual browser viewport instead of a hardcoded 1288x711. Actions now hit the correct screen location on any viewport.
Written for commit 4ae36cc. Summary will update on new commits.