Skip to content

OllamaModelBackend: use TaskGroup for structured cancellation in generate_from_raw #1164

@planetf1

Description

@planetf1

Background

PR #1163 fixed generate_from_raw to propagate exceptions instead of silently swallowing them into empty ModelOutputThunk(value="") (issue #597). The fix uses asyncio.gather(*coroutines) (default return_exceptions=False).

Problem

With asyncio.gather(return_exceptions=False), when one coroutine raises, the other in-flight coroutines are not cancelled — they continue running on the event loop until completion, but their results are discarded. For N parallel Ollama HTTP requests, this means up to N−1 requests complete and their results are thrown away.

This is a pre-existing behaviour of asyncio.gather made visible by the fix in #1163. It was not introduced by that PR; under the old code all N tasks always ran to completion (whether they failed or not).

Desired behaviour

Python 3.12 (the project's minimum version) has asyncio.TaskGroup, which provides structured concurrency: if any task in the group raises, the remaining tasks are cancelled. This avoids wasted compute and network bandwidth.

async with asyncio.TaskGroup() as tg:
    tasks = [tg.create_task(co) for co in coroutines]
responses = [t.result() for t in tasks]

Caveats

  • TaskGroup raises ExceptionGroup (not the raw exception) on failure. Callers that do except ConnectionError would need to use except* or ExceptionGroup handling. This is a semantic change on top of the change in fix: propagate generate_from_raw exceptions in OllamaModelBackend #1163.
  • Consider whether to unwrap single-exception groups to preserve the plain-exception API surface.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/backendsProvider-specific work: Ollama, HF, LiteLLM, OpenAI, Bedrock, vLLM

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions