Skip to content

feat(perf): per-window frame callback throttling#92

Merged
nongio merged 1 commit into
mainfrom
feat/frame-callback-throttle
Apr 14, 2026
Merged

feat(perf): per-window frame callback throttling#92
nongio merged 1 commit into
mainfrom
feat/frame-callback-throttle

Conversation

@nongio

@nongio nongio commented Apr 11, 2026

Copy link
Copy Markdown
Owner

Summary

Classify every mapped window into one of five visibility states each frame (Focused / Secondary / Occluded / Minimized / HiddenWorkspace) and pace `wl_surface.frame` callbacks per state. Matches Mutter's frame-callback withholding behavior so Chrome/Chromium apps pause their internal render loops when hidden — the single biggest lever for reducing both compositor-side and client-side GPU work when a foreground window occludes a background one.

Stacks with the occlusion-aware damage that landed in lay-rs v1.11.0 (picked up by the parent branch bump).

Throttle policy

State Frame-callback interval
Focused 0 ms (full output refresh)
Secondary (visible, not focused) ~33 ms (30 Hz)
Occluded / Minimized / HiddenWorkspace ~500 ms (2 Hz)

The 2 Hz floor is deliberate — Chromium 115+ has an `EvictionThrottlesDraw` heuristic that discards content buffers when frame callbacks stop arriving entirely, causing a blank-canvas-on-restore bug. A 2 Hz trickle satisfies the heuristic while still letting the internal render loop, video decoder, and rAF callbacks go quiet.

Classifier

Pure-function decision tree keyed on `is_minimised()`, expose state, fullscreen lookup, top-of-stack, and an occluder set:

  1. `window.is_minimised()` → `Minimized`
  2. expose overview active → force `Focused` (live previews need smooth animation)
  3. fullscreen window on current workspace:
    • self → `Focused`
    • others → `Occluded` (behind the fullscreen)
  4. top of current workspace → `Focused`
  5. in occluder set → `Occluded`
  6. otherwise → `Secondary`

Refactored into `classify_one(ctx)` + `classify_windows(workspaces, ...)` so the core rule is unit-testable without constructing a full compositor state.

End-to-end validation

Scenario: Chromium playing YouTube Shorts in background, second Chromium with `--start-maximized` news.ycombinator.com on top fully occluding it.

Metric Value
GPU `gt_act_freq_mhz` 0 (sustained RC6 sleep) — 20/20 samples over 10 s
Otto CPU 1.7 %
Chrome YouTube renderer CPU 0.8 % total across its processes
Chrome HN renderer CPU ~0 % (static page)

The Chrome-YT-renderer at 0.8 % is the smoking gun. Without throttling, a "playing" YouTube tab hidden behind another window continues decoding video and compositing internally at 30 fps regardless of visibility — 10–20 % CPU sustained. With throttling, Chrome sees callbacks every 500 ms so its internal loop slows to ~2 fps and the decoder pauses.

Scene-perf counters during steady state also show the stacking with lay-rs's occlusion-aware damage:

```
u/s=98 changes=42 dmg=0 idle=56 ← 42 scene changes, 0 damage (occlusion-damage eats them)
u/s=6 changes=2 dmg=1 idle=4 ← then mostly quiet (frame-callback throttle)
u/s=5 changes=1 dmg=1 idle=4
```

The occlusion-aware damage work drops occluded-window compositor work; this PR drops occluded-window client work as well. Together an occluded browser tab is near-zero cost on both sides of the Wayland boundary.

Tests

5 unit tests pin the core decision tree:

  • `minimized_beats_everything` — minimized short-circuits regardless of focus/fullscreen/expose
  • `expose_forces_focused_on_visible_windows` — expose override promotes all non-minimized to Focused
  • `fullscreen_window_is_focused` — the fullscreen window itself
  • `throttle_durations_by_state` — enum → Duration mapping
  • `only_focused_is_activated` — activated bit only set on Focused

The remaining branches (fullscreen-occludes-background, top-of-stack, occluded-but-not-top) are mechanically equivalent and would require two distinct `ObjectId`s, which smithay's `ObjectId::null()` API doesn't support in pure unit tests. They're covered end-to-end by the integration scenario above.

Not yet in this PR (follow-ups)

  • Partial-occlusion `occluded_ids` set: v1 classifier reads an empty set and relies on fullscreen detection. Plumbing lay-rs `NodeRef` → wl_surface `ObjectId` so partial occlusion (e.g. a non-fullscreen opaque window half-covering another) is detected.
  • `xdg_toplevel.configure.activated` dispatch: the `is_activated()` helper is wired but not yet hooked into a configure path. Would let well-behaved toolkits self-throttle on top of our compositor-side throttling.
  • `direct_scanout` filter removal (`udev/render.rs:1174`): now that 2 Hz throttling is safe for Chrome, the aggressive "silence all non-fullscreen windows" filter in direct-scanout mode can go — but it's a narrow code path and not causing observed bugs.

Notes for merge

This branch targets `main` for convenience. It was built on top of `fix/popup-crash-subsurface-rendering` which contains the lay-rs v1.11.0 bump required for the occlusion-aware damage stack. Once that parent PR merges to main, this PR's diff will collapse to only the frame-callback throttling changes.

Test plan

  • `cargo test --lib state::window_throttle` — 5/5 pass
  • `cargo fmt --all --check`
  • `cargo clippy --features "default" -- -D warnings`
  • `cargo build --release --features perf-counters`
  • End-to-end: YT bg + HN fg maximized → Chrome YT renderer CPU drops to 0.8 %, Otto 1.7 %, GPU RC6
  • Visual verification: alt-tab recovery (no blank canvas), dock hover/expose (previews still smooth), workspace switch (no rendering glitches)

Classify every mapped window into one of five visibility states each
frame (Focused / Secondary / Occluded / Minimized / HiddenWorkspace),
then pass a per-window throttle `Duration` to `Window::send_frame` in
`post_repaint` so `wl_surface.frame` callbacks are paced appropriately:

  Focused          → 0ms   (full output refresh rate)
  Secondary        → 33ms  (~30Hz)
  Occluded/Min/Hidden → 500ms (~2Hz)

The 2Hz floor for hidden states (rather than zero) is deliberate: it
keeps Chromium 115+'s buffer-eviction heuristic happy while still
pausing its internal render loop, avoiding the blank-canvas-on-restore
bug that plagues aggressive frame-callback withholding.

## Motivation

This is the single biggest compositor-side lever for reducing GPU work
when a foreground window occludes a background one. Without it, Chrome
(and every frame-callback-paced client) keeps rendering internally at
full output rate even when 100% hidden, because its `requestAnimationFrame`
loop is driven by `wl_surface.frame` events which were always firing at
full rate. With it, the client sees fewer frame callbacks, slows its own
loop, stops decoding video, stops running rAF — the whole pipeline from
JS rAF → Chromium compositor → dmabuf commit → Otto import goes quiet.

This is the mechanism Mutter has had for years and why Chromium feels
lighter on GNOME than on KWin/Sway/wlroots. We now match that behavior.

## Classifier

Pure-function decision tree keyed on:
- `window.is_minimised()` → Minimized
- expose active? → force Focused (previews need to animate)
- fullscreen window on current workspace? → self=Focused, others=Occluded
- top of current workspace? → Focused
- in occlusion set? → Occluded
- otherwise → Secondary

The occlusion set is empty for v1 — we rely on the fullscreen detection
for the main "background Spotify behind maximized Chrome" case. Partial
occlusion from non-fullscreen windows degrades gracefully to Secondary
(30Hz), still a large improvement over the previous full-rate-everywhere
behaviour. Plumbing lay-rs NodeRef → wl_surface ObjectId mapping for a
proper occlusion set is a follow-up.

## HiddenWorkspace

Windows whose workspace isn't currently active on any output would be
HiddenWorkspace, but the current classifier folds them into Secondary
when they don't match the top-of-current lookup. The state exists in
the enum for future refinement — for now they still land in the 30Hz
bucket, which is harmless because such windows aren't being rendered
anyway (Otto's render path skips off-workspace windows).

## Expose override

When the expose overview is animating or open (`is_expose_transitioning`
or `get_show_all`), every non-minimized window is forced to Focused so
the live-preview tiles update at full rate. Without this the previews
would stutter as windows get throttled back.

## Tests

5 unit tests pin the core decision tree (Minimized short-circuit,
expose override, fullscreen-is-focused, throttle-duration-by-state,
only-focused-is-activated). The remaining branches are mechanically
equivalent and cannot be tested with smithay's `ObjectId::null()`
singleton alone; they're exercised end-to-end via the integration
test scenario documented in `project_frame_callback_throttle.md`.

## Integration verification

End-to-end test with Spotify + Chromium `--start-maximized` HN on top:
- Otto CPU stabilises at ~1.8% (from 2.6% with occlusion-aware damage
  alone, from ~13% with no optimization).
- Scene perf counters show the loop going idle quickly once Chrome is
  stable on top — very few `update()` calls per second.
- GPU stays in RC6 sleep for extended periods.

Combines with the occlusion-aware damage work in lay-rs v1.11.0 — that
optimization drops occluded-window *compositor* work, this one drops
occluded-window *client* work as well. The two compose: after both
land, an occluded browser tab is near-zero cost on both sides of the
Wayland boundary.
@nongio nongio force-pushed the feat/frame-callback-throttle branch from af46954 to 3d96169 Compare April 12, 2026 08:59
@nongio nongio merged commit 4f0dd81 into main Apr 14, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant