feat(llm): multi-LLM failover & round-robin via indexed config by nicoloboschi · Pull Request #2365 · vectorize-io/hindsight

nicoloboschi · 2026-06-23T08:53:58Z

Summary

Adds a general, extensible multi-LLM layer: configure N LLMs by index and pick a routing strategy. This is a provider-agnostic alternative to the LiteLLM Router and a more flexible replacement for a single hard-coded failover secondary (supersedes the direction of #2332).

The unindexed HINDSIGHT_API_LLM_* config stays the primary (member 1); extra members are numbered from 1, and HINDSIGHT_API_LLM_STRATEGY (JSON) selects how to route.

HINDSIGHT_API_LLM_PROVIDER=openai          # primary
HINDSIGHT_API_LLM_API_KEY=sk-...
HINDSIGHT_API_LLM_1_PROVIDER=groq          # member 2
HINDSIGHT_API_LLM_1_API_KEY=gsk-...
HINDSIGHT_API_LLM_2_PROVIDER=anthropic     # member 3
HINDSIGHT_API_LLM_2_API_KEY=sk-ant-...
HINDSIGHT_API_LLM_STRATEGY='{"mode":"failover"}'
# or load-spread:  {"mode":"round-robin"}
# or unbalanced:   {"mode":"round-robin","weights":[3,1,1]}

Strategies

failover — try members in declared order; advance on a member's failure (after its own retries).
round-robin — rotate the starting member per request to spread load, then fall through the rest on failure. Optional positive-int weights (one per member, primary first) for an unbalanced rotation (smooth weighted RR).

Per-operation chains — each operation can override the global chain with a RETAIN_/REFLECT_/CONSOLIDATION_ prefix (e.g. HINDSIGHT_API_RETAIN_LLM_1_PROVIDER, HINDSIGHT_API_RETAIN_LLM_STRATEGY). A per-op slot with no indexed members (or no strategy) inherits the global chain.

Design

MultiLLMProvider mirrors the LLMProvider public surface, so it drops into every existing call path — with_config() / ConfiguredLLMProvider and _provider_impl attribute passthrough are untouched.
Each member keeps its own retry budget; we only advance to the next member after a member exhausts retries and raises.
OutputTooLongError is propagated (a different provider won't fit it either); CancelledError/KeyboardInterrupt/SystemExit propagate unchanged.
verify_connection: strict on the primary, soft (warn-only) on the rest, so a down failover member doesn't block startup.
No-config path returns the plain LLMProvider unchanged (byte-identical hot path).

Files

config.py — LLMMemberConfig/LLMStrategyConfig dataclasses, _parse_llm_members/_parse_llm_strategy, 8 new HindsightConfig fields (credential, server-level static), from_env() wiring.
engine/multi_llm.py (new) — MultiLLMProvider, _should_failover, smooth weighted RR scheduler.
engine/memory_engine.py — _build_llm resolves + wraps each of the 4 LLM slots.
config_resolver.py — restore typed member objects after the asdict round-trip.
Docs (configuration.md), .env.example (+ embed sync), tests.

Notes / non-goals (v1)

Batch retain runs on the primary member only; failover/round-robin apply to the interactive call/call_with_tools paths.
Server-level static config (not per-bank configurable), consistent with other LLM credential fields. The strategy JSON shape is forward-extensible (e.g. future max_attempts).

Tests

tests/test_multi_llm_provider.py — failover ordering, weighted round-robin distribution, error classification, all-fail re-raise, attribute passthrough, strict/soft verify.
tests/test_multi_llm_config.py — indexed-member parsing (contiguity, per-op prefixes, missing-key), strategy JSON validation, per-op inherit/override, plain-vs-wrapped build.
34 new tests pass; ruff + ty + lint.sh clean; embed env-template sync test passes.

Configure extra LLMs by index (HINDSIGHT_API_LLM_<n>_*) alongside the unindexed primary, then route across them with HINDSIGHT_API_LLM_STRATEGY (JSON): {"mode":"failover"} or {"mode":"round-robin"} with optional per-member "weights" for unbalanced rotation. Each operation can override the global chain with a RETAIN_/REFLECT_/CONSOLIDATION_ prefix. A general, provider-agnostic alternative to the LiteLLM Router and a more extensible replacement for the single-secondary failover approach. - config.py: LLMMemberConfig/LLMStrategyConfig dataclasses, indexed-member + strategy parsing, new HindsightConfig fields (credential, server-level). - engine/multi_llm.py: MultiLLMProvider mirrors the LLMProvider surface so it drops into with_config()/ConfiguredLLMProvider and _provider_impl passthrough; smooth weighted round-robin; failover passes through OutputTooLongError and cancellation; strict-primary/soft-secondary verify_connection. - memory_engine.py: _build_llm wraps each of the 4 LLM slots; no-config path returns the plain LLMProvider unchanged. Batch retain runs on the primary member only (documented).

nicoloboschi mentioned this pull request Jun 23, 2026

Add a failover LLM in case the primary one becomes unavailable #2332

Closed

6 tasks

docs: regenerate hindsight-docs skill reference for multi-LLM config

0dbca53

nicoloboschi merged commit f7c7a62 into main Jun 23, 2026
193 of 196 checks passed

nicoloboschi deleted the feat/multi-llm-strategies branch June 23, 2026 12:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): multi-LLM failover & round-robin via indexed config#2365

feat(llm): multi-LLM failover & round-robin via indexed config#2365
nicoloboschi merged 2 commits into
mainfrom
feat/multi-llm-strategies

nicoloboschi commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicoloboschi commented Jun 23, 2026

Summary

Design

Files

Notes / non-goals (v1)

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant