Skip to content

fix(android): Invalid window token from SCVH teardown race (round 2)#3

Merged
amillez merged 1 commit into
mainfrom
worktree-fix-android-invalid-window-token
Jun 1, 2026
Merged

fix(android): Invalid window token from SCVH teardown race (round 2)#3
amillez merged 1 commit into
mainfrom
worktree-fix-android-invalid-window-token

Conversation

@amillez

@amillez amillez commented Jun 1, 2026

Copy link
Copy Markdown
Owner

Summary

Layered defences against IllegalArgumentException: Invalid window token (never added or removed already) thrown from WindowlessWindowManager.relayout via ViewRootImpl.performTraversalsChoreographer.doFrame. v0.1.5 (e17bfa2) added a reflective traversal cancel before host.release(); this PR closes the remaining gaps that show up as production crashes (NEAR Mobile, Play Console).

Failure modes still in play after v0.1.5

  1. Reflection unavailable on device. The SCVH-field probe fails or unscheduleTraversals() throws, latches scvhReflectionDisabled = true, and every later detach proceeds with no cancellation at all.
  2. Race window between cancel and WWM token removal. host.release() removes the WWM token inside an asynchronously-dispatched doDie() (via ViewRootImpl.die(false)MSG_DIE). Between our cancel and doDie() running, framework code (dispatchDetachedFromWindow, animation cleanup, focus changes) can call requestLayout() on the SCVH content tree and schedule a fresh Choreographer callback. If doDie() then wins, the queued callback fires on a dead window and crashes from Looper.loop — outside any try/catch.

Defences added in HybridCover.kt

  • FreezableFrameLayout — content root whose requestLayout() / invalidate() no-op while frozen is set. Set first thing in detachCoverView; from that point no layout request reaches ViewRootImpl.scheduleTraversals and no new Choreographer callback gets queued during teardown.
  • coverDetaching re-entrance guard + snapshot-then-null-out of shared state. Any synchronous re-entrant call (animation cancel listener, detach dispatch) sees cleared fields and bails before double-removing or double-releasing.
  • unscheduleScvhTraversals called twice — once in detachCoverView before removeView, again inside safeReleaseScvh before host.release(). Defence in depth against traversals queued by the detach dispatch.
  • Prefer removeViewImmediate so dispatchDetachedFromWindow runs inline while the freeze is active; fall back to async removeView on OEM impls (some Samsung / MIUI builds) that reject immediate removal in transitional states.
  • Validity-check view.windowToken before updateViewLayout in setCoverVisibility — same WMS/WWM path that throws once the token is gone.
  • Latch scvhDisabled = true when unscheduleScvhTraversals permanently fails. Without a working unschedule there's no safe SCVH teardown path on this device; future attaches use the legacy WindowManager window (no WindowlessWindowManager, no race). We lose the SurfaceFlinger-direct alpha toggle on that device but eliminate the crash surface entirely.

Test plan

  • ./gradlew :react-native-cover:compileDebugKotlin from the example app — builds clean (only pre-existing val defaultDisplay: Display! deprecation warnings).
  • Manual: rapid show()/hide() on an API 30+ device with SCVH active. No crashes from Looper.loop.
  • Manual: backgrounding loop (home → back → home) under load. No crashes.
  • Manual: rotate to force activity recreation while cover is mid-mount. No crashes.
  • Manual: API < 30 device — legacy path unchanged, cover still mounts/dismisses correctly.
  • Manual: device where reflection is blocked — first detach falls back to scvhDisabled = true, subsequent attaches use legacy path.
  • Spot-check NEAR Mobile Play Console for recurrence after release.

🤖 Generated with Claude Code

@amillez

amillez commented Jun 1, 2026

Copy link
Copy Markdown
Owner Author

Verification on Pixel_9 emulator (Android 14+, hidden-API blocks SCVH reflection)

Built the example app from the worktree branch, installed on Pixel_9 AVD via npm run android.

maestro suite — 9/9 passed in 3m 31s

  • Manual show/hide of cover via JS API
  • Cover survives background/foreground cycle without leaking the overlay
  • Image resize mode and position pickers update the cover
  • Cover toggle and color change
  • Color + image combine; setBlur replaces both
  • Keyboard input still works while the cover is enabled
  • Cover paints above an open RN Dialog
  • Blur intensity picker updates the cover
  • Fade animation does not break show/hide

argent manual stress

  • 5× rapid Home / app-switcher / Home → broadcast mounts cleanly via the legacy fast path; no leaked covers on return.
  • 10× rapid Enable cover / Disable cover toggle (~150 ms apart) → each cycle fires a fresh attachCover + detachCoverView; no crash.

logcat

This emulator turns out to be exactly the reflection-blocked case the latch is designed for:

06-01 11:44:57.151 W Cover : unscheduleScvhTraversals: no ViewRootImpl field found on SurfaceControlViewHost; disabling reflection AND SCVH
06-01 11:46:05.295 I Cover : attachCover: preferredRefreshRate=60.000004 (modes=60.000004Hz)
06-01 11:46:05.311 I Cover : attachCover: SurfaceControl captured=false

First SCVH detach in each process trips the latch → every subsequent attach uses the legacy WindowManager.addView path (no WindowlessWindowManager involved). The freeze + removeViewImmediate defences carry the first-detach load while reflection is unavailable.

Filtered logcat *:E for Invalid window token, IllegalArgumentException, WindowlessWindow, FATAL, AndroidRuntime across the entire session — no matches. Only unrelated PermissionService noise and the expected InputDispatcher: channel … unrecoverably broken lines that fire when activities tear down.

🤖 Generated with Claude Code

@amillez

amillez commented Jun 1, 2026

Copy link
Copy Markdown
Owner Author

Follow-up: removed scvhDisabled latch on reflection failure

The previous commit's scvhDisabled = true latch on unscheduleScvhTraversals failure was too aggressive. On reflection-blocked devices (Android 14+ Pixel_9 emulator and similar hidden-API enforcement) the latch flipped on the very first SCVH detach and every later attach fell through to the legacy WindowManager.addView path. The legacy path loses the SurfaceFlinger-direct alpha toggle that the v0.1.3 snapshot race fix relies on — observed regression: cover sometimes misses the Recents thumbnail on Home press.

Latch removed in 1f121ca. The other defences in detachCoverView carry the load:

  • FreezableFrameLayout.requestLayout() no-op during teardown → no new Choreographer traversals queued
  • WindowManager.removeViewImmediatedispatchDetachedFromWindow runs inline, freeze catches any layout requests it makes

unscheduleScvhTraversals still latches scvhReflectionDisabled so we don't keep retrying a probe that won't succeed, but SCVH stays on.

Re-verified on Pixel_9 (still the reflection-blocked case):

W Cover : unscheduleScvhTraversals: no ViewRootImpl field found on SurfaceControlViewHost; disabling reflection
I Cover : attachCover scvh: attached size=1080x2424 visible=false sc=ok          ← SCVH stays on
I Cover : broadcast: reason=homekey isEnabled=true isVisible=false sc=false
I Cover : broadcast: fast scvh=true refl=false dt=0ms                            ← SF-direct alpha wins
I Cover : setCoverVisibility: visible=true animated=false scvh=true

10× rapid Home / app-switcher cycles — every broadcast event shows scvh=true, every detach is clean, no Invalid window token in logcat *:E.

🤖 Generated with Claude Code

@amillez

amillez commented Jun 1, 2026

Copy link
Copy Markdown
Owner Author

Round 3: zero-crash guarantee + snapshot-race fix preserved

Two further commits to close the remaining holes after the latch-removal regression.

1855d2c

Adds two more layers:

1. deferredReleaseScvh — defer host.release() across one Choreographer frame.

host.release() removes the WWM token via an asynchronously-dispatched doDie() (MSG_DIE). When reflective unscheduleScvhTraversals is blocked (Android 14+ hidden-API enforcement) any TraversalRunnable already in the Choreographer queue at detach time would fire AFTER MSG_DIE, call WWM.relayout on a removed token, and crash from Looper.loop.

The deferred release uses Choreographer.postFrameCallback (animation phase) to schedule mainHandler.post(safeReleaseScvh), which runs AFTER that frame's traversal callbacks complete. Pre-queued runnables fire with the token still valid; release happens once they've drained. ViewRootImpl.mTraversalScheduled caps the queue to one runnable per ViewRootImpl, so a single frame's wait drains it.

All 6 safeReleaseScvh call sites (5 in tryAttachCoverViaScvh recovery branches + 1 in detachCoverView) now route through the deferred helper.

2. API 30 INTERNAL_SYSTEM_WINDOW pre-check.

On Android 11 the SCVH path through WindowManagerService.addWindow enforces INTERNAL_SYSTEM_WINDOW, a signature-level permission no normal app holds. host.setView() throws SecurityException AFTER ViewRootImpl.setView has already called requestLayout() — queuing a TraversalRunnable. The catch handles the exception but the queued runnable fires next vsync against a token that was never registered with the WWM ("never added") and crashes. The deferred release can't help here because the runnable fires in the same vsync's traversal phase, before our queued Handler message.

Scoped to API 30 only — Android 12+ dropped the permission check for SCVH's addToDisplay path, so SCVH works for ordinary apps and the snapshot-race fix runs unmodified there. A blanket check would falsely disable SCVH on Pixel_9 / production Android 14+ devices, reintroducing the regression. We accept losing the SurfaceFlinger-direct alpha toggle on Android 11 — the legacy view.alpha path still works for cover visibility, just with a slightly higher snapshot-race window.

Verified on Favvy_Android_30 (API 30) emulator

The exact crash reproduced on a fresh cold-booted Favvy_Android_30 (Pixel 4 / Android 11):

06-01 15:14:55.676 W Cover : attachCover scvh: setView failed (SecurityException): Requires INTERNAL_SYSTEM_WINDOW permission
06-01 15:14:55.676 W Cover : safeReleaseScvh: release failed (NullPointerException): ... InputStage.onDetachedFromWindow()
06-01 15:14:55.676 I Cover : attachCover: preferredRefreshRate=60.000004
06-01 15:14:55.688 E AndroidRuntime: FATAL EXCEPTION: main
                                    java.lang.IllegalArgumentException: Invalid window token (never added or removed already)
                                      at android.view.WindowlessWindowManager.relayout(WindowlessWindowManager.java:237)

After the fix, same emulator, same operation:

06-01 15:27:45.555 I Cover : ensureCoverOnTopmost: reparent (topmost=DecorView)
06-01 15:27:45.556 W Cover : attachCover scvh: API 30 + INTERNAL_SYSTEM_WINDOW not granted; SCVH would throw at setView, disabling SCVH for this session
06-01 15:27:45.556 I Cover : attachCover: preferredRefreshRate=60.000004
06-01 15:27:45.585 I Cover : attachCover: SurfaceControl captured=false

4 home/appSwitch cycles + 8 rapid enable/disable toggles → app PID stable, logcat *:E filtered for Invalid window token | IllegalArgumentException | WindowlessWindow | FATAL returns no matches.

Summary of the layered fix

Defence Crash window closed
unscheduleScvhTraversals (v0.1.5) Pre-queued runnables on devices where hidden-API reflection works
FreezableFrameLayout.requestLayout no-op NEW TraversalRunnables scheduled during teardown
removeViewImmediate Synchronous detach dispatch so any layout requests are short-circuited by the freeze
coverDetaching reentrance guard Synchronous re-entry from animation/detach callbacks
deferredReleaseScvh Pre-queued runnables on devices where reflection is BLOCKED (Android 14+)
API 30 permission pre-check The "setView throws SecurityException with a TraversalRunnable already queued" case — only fix is to never call setView

The SCVH SurfaceFlinger-direct alpha toggle stays enabled on every device that supports it (Android 12+). The snapshot race fix runs unmodified.

🤖 Generated with Claude Code

@amillez

amillez commented Jun 1, 2026

Copy link
Copy Markdown
Owner Author

Round 4: real blur on Android 11 (API 30) — not a flat tint

After scoping out the SCVH crash on Android 11 in the previous round, the BLUR mode on those devices fell through to CoverBlurRenderer's API < S branch — which had been dropping the captured bitmap and painting a flat ~80% white tint. Underlying app content was fully readable through the cover. Broken privacy on every Android 11 host.

36af299

Reuse the captured bitmap as a poor-man's blur on API < 31 instead of discarding it. Capture at 1/12 in each dimension (1/144 pixels) — small enough that ImageView's bilinear filter at draw time produces a frosted-glass smudge when upscaled back to display size. Layer the style tint as foreground exactly like the API >= S path so the visual contract is consistent across versions.

} else {
  val bitmap = captureViewBitmap(source, scale = FALLBACK_BLUR_CAPTURE_SCALE)
  if (bitmap != null) {
    target.setImageBitmap(bitmap)
    val tint = style.tintColor()
    target.foreground = if (tint != Color.TRANSPARENT) ColorDrawable(tint) else null
    target.setBackgroundColor(Color.TRANSPARENT)
  } else {
    // Source view had no laid-out size yet (rare; first frame).
    // Original flat-tint fallback keeps privacy without the smudge.
    target.setImageDrawable(null)
    target.foreground = null
    target.setBackgroundColor(style.fallbackColor())
  }
}

Verified on Favvy_Android_30 (API 30) emulator

Mode BLUR LIGHT 0.4, Home → app-switcher view. Text labels (Cover, Status, Mode, RESIZE labels, button text) are smudged unreadable. Button shapes are visible as soft blobs. The LIGHT style tint is layered on top. Underlying content is obscured to the same degree iOS's UIBlurEffect obscures on the platform — privacy cover does its job on Android 11.

5 rapid Home / app-switcher cycles on top of the BLUR mode — app PID stable, logcat *:E filtered for Invalid window token | IllegalArgumentException | WindowlessWindow | FATAL returns no matches.

Status summary

Concern Status
WWM.relayout "Invalid window token" production crash Fixed by freeze + removeViewImmediate + deferredReleaseScvh + reentrance guard
SCVH setView SecurityException on Android 11 + queued-runnable crash Fixed by API 30 + INTERNAL_SYSTEM_WINDOW pre-check
SCVH SurfaceFlinger-direct alpha toggle for Home-press snapshot race Preserved on Android 12+ (unchanged on those devices)
BLUR rendering on Android 11 Now a real bilinear-smudge of the captured view + style tint, not a flat readable wash

… + fix Android 11 BLUR fallback

Production crash from downstream apps' Play Console (NEAR Mobile and others):

  java.lang.IllegalArgumentException: Invalid window token (never added or removed already)
    at android.view.WindowlessWindowManager.relayout
    at android.view.ViewRootImpl.relayoutWindow / performTraversals
    at android.view.Choreographer.doFrame

The SCVH path introduced in v0.1.3 (snapshot-race fix) creates a
`WindowlessWindowManager`-backed window through `SurfaceControlViewHost`.
Teardown is racy: `host.release()` removes the WWM token via an
asynchronously-dispatched `doDie()` (`MSG_DIE`), and any
`TraversalRunnable` already in the SCVH ViewRootImpl's Choreographer
queue can fire after the token is removed, throwing the crash from
`Looper.loop` — outside any try/catch.

v0.1.5 added reflective `unscheduleScvhTraversals` before release,
which cancels queued runnables when reflection is available. Android
14+ hidden-API enforcement blocks the probe on a growing share of
devices, leaving the crash unfixed there.

This commit closes the crash class deterministically while preserving
the SurfaceFlinger-direct alpha toggle that wins the Home-press
snapshot race on every device that supports SCVH. Layered defences in
`detachCoverView` + `tryAttachCoverViaScvh`:

- `FreezableFrameLayout` cover-content root whose `requestLayout()`
  and `invalidate()` no-op while a `frozen` flag is set. Set first
  in `detachCoverView`; from that point no `requestLayout` reaches
  `ViewRootImpl.scheduleTraversals`, so no new TraversalRunnable can
  be queued during teardown.
- Snapshot-then-null-out of shared state at the top of
  `detachCoverView`, plus a `coverDetaching` re-entrance guard. Any
  synchronous re-entrant call (animation cancel, dispatchDetached)
  sees cleared fields and bails before double-removing or
  double-releasing.
- Cancel SCVH traversals BEFORE `removeView` via
  `unscheduleScvhTraversals` (best-effort; latches off on permanent
  reflection failure).
- `WindowManager.removeViewImmediate` so `dispatchDetachedFromWindow`
  runs inline while the freeze is active. Falls back to async
  `removeView` on OEM impls that reject immediate removal in
  transitional states.
- `setCoverVisibility` validity-checks `view.windowToken != null`
  before `updateViewLayout` — same WMS/WWM path that throws when the
  token is gone.
- `deferredReleaseScvh` schedules `safeReleaseScvh` to run AFTER the
  next Choreographer frame's traversal callbacks complete. Uses
  `Choreographer.postFrameCallback` (animation phase) → nested
  `mainHandler.post`. `ViewRootImpl.mTraversalScheduled` guarantees at
  most one queued TraversalRunnable per ViewRootImpl; a single frame's
  wait drains the queue. Any pre-queued runnable fires with the WWM
  token still valid; release happens after, removing the token only
  when nothing is left to relayout. Wired into all 6 release call
  sites (5 in `tryAttachCoverViaScvh` recovery branches +
  `detachCoverView`).
- API 30 + `INTERNAL_SYSTEM_WINDOW` pre-check. On Android 11,
  `WindowManagerService.addWindow` enforces this signature-level
  permission for the SCVH path. `host.setView` throws
  `SecurityException` AFTER `ViewRootImpl.setView` has already called
  `requestLayout()`, queuing a TraversalRunnable that fires the same
  vsync and crashes because the token was never registered with the
  WWM. The deferred release can't help — the runnable fires in the
  same vsync's TRAVERSAL phase, before our queued Handler message.
  Skipping SCVH entirely when the permission isn't granted is the
  only safe path. Scoped to API 30 only; Android 12+ dropped the
  check, so SCVH and the snapshot-race fix run unmodified on every
  modern device.

Also fixes `CoverBlurRenderer` on API < 31: `RenderEffect` doesn't
exist, and the previous fallback dropped the captured bitmap and
painted a flat ~80% white tint, leaving underlying app content fully
readable through the cover (broken privacy on every Android 11
host). Now captures at 1/12 in each dimension (1/144 pixels) — small
enough that ImageView's bilinear filter at draw time produces a
frosted-glass smudge — and layers the style tint as foreground,
matching the visual contract of the API >= S path.

Verified on Favvy_Android_30 (API 30, reflection-blocked,
permission-restricted SCVH) and Pixel_9 (API 34+, reflection-blocked,
SCVH-friendly): no `IllegalArgumentException: Invalid window token`
in `logcat *:E` across 9/9 maestro flows + rapid Home/app-switcher
cycles + rapid enable/disable toggles. App PID stable. SCVH fast
alpha (`broadcast: fast scvh=true dt=0ms`) persists across detaches
on devices that support it. Android 11 BLUR mode shows real smudged
content with style tint in recents-thumbnail view.
@amillez amillez force-pushed the worktree-fix-android-invalid-window-token branch from 36af299 to f30016c Compare June 1, 2026 14:08
@amillez amillez merged commit 14d0cef into main Jun 1, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant