[BUG] Async generator wedges after a mid-stream job is cancelled — no jobs complete afterward (surfaced via tabbyAPI)

Describe the bug:
Cross-link: theroyallab/tabbyAPI#428

When a streaming job is cancelled mid-generation (the frontend disconnects), the async generator appears to deadlock: **no job ever completes afterward**. Newly enqueued jobs are accepted but never produce tokens, GPU memory stays fully allocated, and GPU utilization drops to ~0%. Only a full reload recovers it. The serving frontend stays otherwise responsive (e.g. `/v1/models` keeps returning 200), so it is easy to miss.

Note: commit `a03f0ff` ("AsyncGenerator: Ensure cancel request is forwarded but don't crash if frontend breaks contract") is **already present in 0.0.42**, and `0.0.43` has no further generator/cancel changes — so this path still wedges. It looks like the cancelled job isn't reliably reaped from the batch scheduler, stalling the whole queue.

Reproduction steps:
1. Load a 27B exl3 model (3.08bpw), 2× GPU tensor split, continuous batching (max_batch_size=6, cache_size=327680, max_seq_len=81920).
2. Drive streaming generations through a frontend (here: tabbyAPI).
3. Cancel a request mid-generation (client disconnect / "stop" / tab close).
4. Subsequent jobs are accepted but never generate any tokens.

Full repro, logs and config: theroyallab/tabbyAPI#428

Expected behavior:
A cancelled job should be reaped from the batch scheduler; remaining and newly enqueued jobs should continue to generate normally.

Environment / versions:
exllamav3 0.0.42 (source commit 595d6c4)
torch 2.11.0+cu128, CUDA 12.8, Python 3.11.15
2× GPU (12GB + 8GB), tensor split
model: 27B exl3 @ 3.08bpw, continuous batching

Logs / Additional context
Full logs and config are in theroyallab/tabbyAPI#428

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BUG] Async generator wedges after a mid-stream job is cancelled — no jobs complete afterward (surfaced via tabbyAPI) #227

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

[BUG] Async generator wedges after a mid-stream job is cancelled — no jobs complete afterward (surfaced via tabbyAPI) #227

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions