fix(trickle): migrate wait_for → asyncio.timeout to fix cancellation masking#11
Open
fix(trickle): migrate wait_for → asyncio.timeout to fix cancellation masking#11
Conversation
…masking Py 3.11's wait_for has a race (bpo-32751) that masks outer cancellation as TimeoutError. SegmentWriter.write wrapped it as TrickleSegmentWriteError; SegmentWriter.close logged it. Both produced misleading shutdown tracebacks. asyncio.timeout (3.11+) was designed to fix this. Migrate all four wait_for callsites — two symptomatic in trickle_publisher, two preventive in media_publish. Bump requires-python to >=3.11 (Docker + CI already on 3.11). Closes #10. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Modernization, not a bug fix
This PR migrates four
asyncio.wait_forcallsites toasyncio.timeout(Python 3.11+). The migration is correct and worthwhile on its own —asyncio.timeoutwas added specifically to addresswait_for's cancellation behavior — but it does not fix the shutdown-noise symptom originally tracked in #10.After migration, live runner E2E (
examples/runner/live_grayscale/test.sh) still reproduces the sameTrickleSegmentWriteErrorand"close suppressed"log lines. Verified via timestamped logs that the timeouts firing are legitimate — the segment's HTTP post-body consumer is dead/wedged, soqueue.putblocks until the 5s deadline. That's a separate root cause; #10 has been updated to describe it.Why migrate anyway
asyncio.wait_forhas documented races where outer cancellation can be masked asTimeoutError(bpo-32751). We didn't actually hit that race in our reproduction, butasyncio.timeoutusestask.uncancel()internally to handle cancellation correctly by design. Pre-emptive correctness — and it's the recommended primitive for new code from 3.11 onward.Changes
trickle_publisher.pySegmentWriter.write/closemigrated. ExplicitCancelledErrorre-raise inclose()soBaseExceptiondoesn't log a misleading "close suppressed" warning when cancel propagates.media_publish.pypyproject.tomlrequires-pythonbumped from>=3.10to>=3.11. Required becauseasyncio.timeoutis 3.11+. Docker and CI already run 3.11; 3.10 EOL is October 2026.Validation
asyncio.wait_forcallsites insrc/livepeer_gateway/migrated; only generated gRPCwait_for_readyparameters remain (unrelated).examples/runner/live_grayscale/test.shstill passes (and still produces the unrelated shutdown traceback noise tracked in TricklePublisher.SegmentWriter masks CancelledError as TrickleSegmentWriteError on shutdown #10).Out of scope
Fixing the shutdown noise — see updated #10 for the actual root cause and proposed approach.