Skip to content

fix(trickle): migrate wait_for → asyncio.timeout to fix cancellation masking#11

Open
rickstaa wants to merge 1 commit intomainfrom
fix/trickle-cancellation-noise
Open

fix(trickle): migrate wait_for → asyncio.timeout to fix cancellation masking#11
rickstaa wants to merge 1 commit intomainfrom
fix/trickle-cancellation-noise

Conversation

@rickstaa
Copy link
Copy Markdown
Member

@rickstaa rickstaa commented May 5, 2026

Modernization, not a bug fix

This PR migrates four asyncio.wait_for callsites to asyncio.timeout (Python 3.11+). The migration is correct and worthwhile on its own — asyncio.timeout was added specifically to address wait_for's cancellation behavior — but it does not fix the shutdown-noise symptom originally tracked in #10.

After migration, live runner E2E (examples/runner/live_grayscale/test.sh) still reproduces the same TrickleSegmentWriteError and "close suppressed" log lines. Verified via timestamped logs that the timeouts firing are legitimate — the segment's HTTP post-body consumer is dead/wedged, so queue.put blocks until the 5s deadline. That's a separate root cause; #10 has been updated to describe it.

Why migrate anyway

asyncio.wait_for has documented races where outer cancellation can be masked as TimeoutError (bpo-32751). We didn't actually hit that race in our reproduction, but asyncio.timeout uses task.uncancel() internally to handle cancellation correctly by design. Pre-emptive correctness — and it's the recommended primitive for new code from 3.11 onward.

Changes

File Change
trickle_publisher.py SegmentWriter.write/close migrated. Explicit CancelledError re-raise in close() so BaseException doesn't log a misleading "close suppressed" warning when cancel propagates.
media_publish.py Idle-timeout watchdog (line 945) and drain-on-shutdown (line 1049) migrated.
pyproject.toml requires-python bumped from >=3.10 to >=3.11. Required because asyncio.timeout is 3.11+. Docker and CI already run 3.11; 3.10 EOL is October 2026.

Validation

Out of scope

Fixing the shutdown noise — see updated #10 for the actual root cause and proposed approach.

…masking

Py 3.11's wait_for has a race (bpo-32751) that masks outer cancellation
as TimeoutError. SegmentWriter.write wrapped it as TrickleSegmentWriteError;
SegmentWriter.close logged it. Both produced misleading shutdown tracebacks.

asyncio.timeout (3.11+) was designed to fix this. Migrate all four wait_for
callsites — two symptomatic in trickle_publisher, two preventive in
media_publish. Bump requires-python to >=3.11 (Docker + CI already on 3.11).

Closes #10.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant