Skip to content

[BUG] Async generator wedges after a mid-stream job is cancelled — no jobs complete afterward (surfaced via tabbyAPI) #227

Description

@nakaken3013-code

Describe the bug:
Cross-link: theroyallab/tabbyAPI#428

When a streaming job is cancelled mid-generation (the frontend disconnects), the async generator appears to deadlock: no job ever completes afterward. Newly enqueued jobs are accepted but never produce tokens, GPU memory stays fully allocated, and GPU utilization drops to ~0%. Only a full reload recovers it. The serving frontend stays otherwise responsive (e.g. /v1/models keeps returning 200), so it is easy to miss.

Note: commit a03f0ff ("AsyncGenerator: Ensure cancel request is forwarded but don't crash if frontend breaks contract") is already present in 0.0.42, and 0.0.43 has no further generator/cancel changes — so this path still wedges. It looks like the cancelled job isn't reliably reaped from the batch scheduler, stalling the whole queue.

Reproduction steps:

  1. Load a 27B exl3 model (3.08bpw), 2× GPU tensor split, continuous batching (max_batch_size=6, cache_size=327680, max_seq_len=81920).
  2. Drive streaming generations through a frontend (here: tabbyAPI).
  3. Cancel a request mid-generation (client disconnect / "stop" / tab close).
  4. Subsequent jobs are accepted but never generate any tokens.

Full repro, logs and config: theroyallab/tabbyAPI#428

Expected behavior:
A cancelled job should be reaped from the batch scheduler; remaining and newly enqueued jobs should continue to generate normally.

Environment / versions:
exllamav3 0.0.42 (source commit 595d6c4)
torch 2.11.0+cu128, CUDA 12.8, Python 3.11.15
2× GPU (12GB + 8GB), tensor split
model: 27B exl3 @ 3.08bpw, continuous batching

Logs / Additional context
Full logs and config are in theroyallab/tabbyAPI#428

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions