feat(perf): per-window frame callback throttling#92
Merged
Conversation
Classify every mapped window into one of five visibility states each frame (Focused / Secondary / Occluded / Minimized / HiddenWorkspace), then pass a per-window throttle `Duration` to `Window::send_frame` in `post_repaint` so `wl_surface.frame` callbacks are paced appropriately: Focused → 0ms (full output refresh rate) Secondary → 33ms (~30Hz) Occluded/Min/Hidden → 500ms (~2Hz) The 2Hz floor for hidden states (rather than zero) is deliberate: it keeps Chromium 115+'s buffer-eviction heuristic happy while still pausing its internal render loop, avoiding the blank-canvas-on-restore bug that plagues aggressive frame-callback withholding. ## Motivation This is the single biggest compositor-side lever for reducing GPU work when a foreground window occludes a background one. Without it, Chrome (and every frame-callback-paced client) keeps rendering internally at full output rate even when 100% hidden, because its `requestAnimationFrame` loop is driven by `wl_surface.frame` events which were always firing at full rate. With it, the client sees fewer frame callbacks, slows its own loop, stops decoding video, stops running rAF — the whole pipeline from JS rAF → Chromium compositor → dmabuf commit → Otto import goes quiet. This is the mechanism Mutter has had for years and why Chromium feels lighter on GNOME than on KWin/Sway/wlroots. We now match that behavior. ## Classifier Pure-function decision tree keyed on: - `window.is_minimised()` → Minimized - expose active? → force Focused (previews need to animate) - fullscreen window on current workspace? → self=Focused, others=Occluded - top of current workspace? → Focused - in occlusion set? → Occluded - otherwise → Secondary The occlusion set is empty for v1 — we rely on the fullscreen detection for the main "background Spotify behind maximized Chrome" case. Partial occlusion from non-fullscreen windows degrades gracefully to Secondary (30Hz), still a large improvement over the previous full-rate-everywhere behaviour. Plumbing lay-rs NodeRef → wl_surface ObjectId mapping for a proper occlusion set is a follow-up. ## HiddenWorkspace Windows whose workspace isn't currently active on any output would be HiddenWorkspace, but the current classifier folds them into Secondary when they don't match the top-of-current lookup. The state exists in the enum for future refinement — for now they still land in the 30Hz bucket, which is harmless because such windows aren't being rendered anyway (Otto's render path skips off-workspace windows). ## Expose override When the expose overview is animating or open (`is_expose_transitioning` or `get_show_all`), every non-minimized window is forced to Focused so the live-preview tiles update at full rate. Without this the previews would stutter as windows get throttled back. ## Tests 5 unit tests pin the core decision tree (Minimized short-circuit, expose override, fullscreen-is-focused, throttle-duration-by-state, only-focused-is-activated). The remaining branches are mechanically equivalent and cannot be tested with smithay's `ObjectId::null()` singleton alone; they're exercised end-to-end via the integration test scenario documented in `project_frame_callback_throttle.md`. ## Integration verification End-to-end test with Spotify + Chromium `--start-maximized` HN on top: - Otto CPU stabilises at ~1.8% (from 2.6% with occlusion-aware damage alone, from ~13% with no optimization). - Scene perf counters show the loop going idle quickly once Chrome is stable on top — very few `update()` calls per second. - GPU stays in RC6 sleep for extended periods. Combines with the occlusion-aware damage work in lay-rs v1.11.0 — that optimization drops occluded-window *compositor* work, this one drops occluded-window *client* work as well. The two compose: after both land, an occluded browser tab is near-zero cost on both sides of the Wayland boundary.
af46954 to
3d96169
Compare
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Classify every mapped window into one of five visibility states each frame (Focused / Secondary / Occluded / Minimized / HiddenWorkspace) and pace `wl_surface.frame` callbacks per state. Matches Mutter's frame-callback withholding behavior so Chrome/Chromium apps pause their internal render loops when hidden — the single biggest lever for reducing both compositor-side and client-side GPU work when a foreground window occludes a background one.
Stacks with the occlusion-aware damage that landed in lay-rs v1.11.0 (picked up by the parent branch bump).
Throttle policy
The 2 Hz floor is deliberate — Chromium 115+ has an `EvictionThrottlesDraw` heuristic that discards content buffers when frame callbacks stop arriving entirely, causing a blank-canvas-on-restore bug. A 2 Hz trickle satisfies the heuristic while still letting the internal render loop, video decoder, and rAF callbacks go quiet.
Classifier
Pure-function decision tree keyed on `is_minimised()`, expose state, fullscreen lookup, top-of-stack, and an occluder set:
Refactored into `classify_one(ctx)` + `classify_windows(workspaces, ...)` so the core rule is unit-testable without constructing a full compositor state.
End-to-end validation
Scenario: Chromium playing YouTube Shorts in background, second Chromium with `--start-maximized` news.ycombinator.com on top fully occluding it.
The Chrome-YT-renderer at 0.8 % is the smoking gun. Without throttling, a "playing" YouTube tab hidden behind another window continues decoding video and compositing internally at 30 fps regardless of visibility — 10–20 % CPU sustained. With throttling, Chrome sees callbacks every 500 ms so its internal loop slows to ~2 fps and the decoder pauses.
Scene-perf counters during steady state also show the stacking with lay-rs's occlusion-aware damage:
```
u/s=98 changes=42 dmg=0 idle=56 ← 42 scene changes, 0 damage (occlusion-damage eats them)
u/s=6 changes=2 dmg=1 idle=4 ← then mostly quiet (frame-callback throttle)
u/s=5 changes=1 dmg=1 idle=4
```
The occlusion-aware damage work drops occluded-window compositor work; this PR drops occluded-window client work as well. Together an occluded browser tab is near-zero cost on both sides of the Wayland boundary.
Tests
5 unit tests pin the core decision tree:
The remaining branches (fullscreen-occludes-background, top-of-stack, occluded-but-not-top) are mechanically equivalent and would require two distinct `ObjectId`s, which smithay's `ObjectId::null()` API doesn't support in pure unit tests. They're covered end-to-end by the integration scenario above.
Not yet in this PR (follow-ups)
Notes for merge
This branch targets `main` for convenience. It was built on top of `fix/popup-crash-subsurface-rendering` which contains the lay-rs v1.11.0 bump required for the occlusion-aware damage stack. Once that parent PR merges to main, this PR's diff will collapse to only the frame-callback throttling changes.
Test plan