Skip to content

feat(llm): multi-LLM failover & round-robin via indexed config#2365

Merged
nicoloboschi merged 2 commits into
mainfrom
feat/multi-llm-strategies
Jun 23, 2026
Merged

feat(llm): multi-LLM failover & round-robin via indexed config#2365
nicoloboschi merged 2 commits into
mainfrom
feat/multi-llm-strategies

Conversation

@nicoloboschi

Copy link
Copy Markdown
Collaborator

Summary

Adds a general, extensible multi-LLM layer: configure N LLMs by index and pick a routing strategy. This is a provider-agnostic alternative to the LiteLLM Router and a more flexible replacement for a single hard-coded failover secondary (supersedes the direction of #2332).

The unindexed HINDSIGHT_API_LLM_* config stays the primary (member 1); extra members are numbered from 1, and HINDSIGHT_API_LLM_STRATEGY (JSON) selects how to route.

HINDSIGHT_API_LLM_PROVIDER=openai          # primary
HINDSIGHT_API_LLM_API_KEY=sk-...
HINDSIGHT_API_LLM_1_PROVIDER=groq          # member 2
HINDSIGHT_API_LLM_1_API_KEY=gsk-...
HINDSIGHT_API_LLM_2_PROVIDER=anthropic     # member 3
HINDSIGHT_API_LLM_2_API_KEY=sk-ant-...
HINDSIGHT_API_LLM_STRATEGY='{"mode":"failover"}'
# or load-spread:  {"mode":"round-robin"}
# or unbalanced:   {"mode":"round-robin","weights":[3,1,1]}

Strategies

  • failover — try members in declared order; advance on a member's failure (after its own retries).
  • round-robin — rotate the starting member per request to spread load, then fall through the rest on failure. Optional positive-int weights (one per member, primary first) for an unbalanced rotation (smooth weighted RR).

Per-operation chains — each operation can override the global chain with a RETAIN_/REFLECT_/CONSOLIDATION_ prefix (e.g. HINDSIGHT_API_RETAIN_LLM_1_PROVIDER, HINDSIGHT_API_RETAIN_LLM_STRATEGY). A per-op slot with no indexed members (or no strategy) inherits the global chain.

Design

  • MultiLLMProvider mirrors the LLMProvider public surface, so it drops into every existing call path — with_config() / ConfiguredLLMProvider and _provider_impl attribute passthrough are untouched.
  • Each member keeps its own retry budget; we only advance to the next member after a member exhausts retries and raises.
  • OutputTooLongError is propagated (a different provider won't fit it either); CancelledError/KeyboardInterrupt/SystemExit propagate unchanged.
  • verify_connection: strict on the primary, soft (warn-only) on the rest, so a down failover member doesn't block startup.
  • No-config path returns the plain LLMProvider unchanged (byte-identical hot path).

Files

  • config.pyLLMMemberConfig/LLMStrategyConfig dataclasses, _parse_llm_members/_parse_llm_strategy, 8 new HindsightConfig fields (credential, server-level static), from_env() wiring.
  • engine/multi_llm.py (new) — MultiLLMProvider, _should_failover, smooth weighted RR scheduler.
  • engine/memory_engine.py_build_llm resolves + wraps each of the 4 LLM slots.
  • config_resolver.py — restore typed member objects after the asdict round-trip.
  • Docs (configuration.md), .env.example (+ embed sync), tests.

Notes / non-goals (v1)

  • Batch retain runs on the primary member only; failover/round-robin apply to the interactive call/call_with_tools paths.
  • Server-level static config (not per-bank configurable), consistent with other LLM credential fields. The strategy JSON shape is forward-extensible (e.g. future max_attempts).

Tests

  • tests/test_multi_llm_provider.py — failover ordering, weighted round-robin distribution, error classification, all-fail re-raise, attribute passthrough, strict/soft verify.
  • tests/test_multi_llm_config.py — indexed-member parsing (contiguity, per-op prefixes, missing-key), strategy JSON validation, per-op inherit/override, plain-vs-wrapped build.
  • 34 new tests pass; ruff + ty + lint.sh clean; embed env-template sync test passes.

Configure extra LLMs by index (HINDSIGHT_API_LLM_<n>_*) alongside the
unindexed primary, then route across them with HINDSIGHT_API_LLM_STRATEGY
(JSON): {"mode":"failover"} or {"mode":"round-robin"} with optional
per-member "weights" for unbalanced rotation. Each operation can override
the global chain with a RETAIN_/REFLECT_/CONSOLIDATION_ prefix.

A general, provider-agnostic alternative to the LiteLLM Router and a more
extensible replacement for the single-secondary failover approach.

- config.py: LLMMemberConfig/LLMStrategyConfig dataclasses, indexed-member
  + strategy parsing, new HindsightConfig fields (credential, server-level).
- engine/multi_llm.py: MultiLLMProvider mirrors the LLMProvider surface so it
  drops into with_config()/ConfiguredLLMProvider and _provider_impl passthrough;
  smooth weighted round-robin; failover passes through OutputTooLongError and
  cancellation; strict-primary/soft-secondary verify_connection.
- memory_engine.py: _build_llm wraps each of the 4 LLM slots; no-config path
  returns the plain LLMProvider unchanged.

Batch retain runs on the primary member only (documented).
@nicoloboschi nicoloboschi merged commit f7c7a62 into main Jun 23, 2026
193 of 196 checks passed
@nicoloboschi nicoloboschi deleted the feat/multi-llm-strategies branch June 23, 2026 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant