Skip to content

Error handling agent down#211

Closed
chaitu237 wants to merge 6 commits into
MSDLLCpapers:mainfrom
chaitu237:Error-handling-agent-down
Closed

Error handling agent down#211
chaitu237 wants to merge 6 commits into
MSDLLCpapers:mainfrom
chaitu237:Error-handling-agent-down

Conversation

@chaitu237
Copy link
Copy Markdown

Description

The AO was returning a generic 500 Internal Server Error whenever a downstream agent was unavailable, timed out, or had an auth issue. This made it impossible for callers to know what actually failed. Updated the AO and sk-agents framework to catch these failures properly and return meaningful error responses with the right HTTP status codes — both to the caller and in the logs.

Changes

  1. Added 4 custom exception classes (AgentConnectionError, AgentTimeoutError, AgentResponseError, AgentInvalidResponseError) to classify downstream failures
  2. Wrapped all agent invocation methods (invoke_api, invoke_sse, invoke_stream) and the recipient chooser with specific error handling for connection failures, timeouts, bad HTTP responses, and invalid JSON
  3. REST, SSE, and WebSocket routes now map each failure type to the correct HTTP status code (502 agent down, 504 timeout, 401 auth failure, 429 rate limited) with a descriptive message including the agent name
  4. Added logger.error() calls at each failure point so errors are visible in container logs
  5. Added AgentUnavailableException, LLMAuthenticationException, LLMServiceException to sk-agents and wired them up through routes, chat agents, sequential agents, and the handler
  6. Fixed ao.sh Windows line endings (CRLF → LF) that were breaking container startup on Linux
  7. Fixed agent model names in config files to use gpt-4o instead of dated variants not supported by DefaultChatCompletionFactory
  8. Added 32 unit tests covering all new error scenarios

Type of Change

  • Bugfix
  • New feature
  • [x ] Refactor
  • Documentation
  • Other (please specify): ____________

Screenshots (if applicable)

Screenshot 2026-03-11 171729 Screenshot 2026-03-11 171656

@michaelTurnbach
Copy link
Copy Markdown
Collaborator

please make sure all lint checks pass

@chaitu237 chaitu237 closed this by deleting the head repository Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants