Skip to content

WF-IMPL-075: Outbound RPC error taxonomy + ActivityResultEnvelope mapping (#486)#498

Merged
toddysm merged 2 commits into
mainfrom
wf-impl-075-outbound-rpc-error-taxonomy
Jun 1, 2026
Merged

WF-IMPL-075: Outbound RPC error taxonomy + ActivityResultEnvelope mapping (#486)#498
toddysm merged 2 commits into
mainfrom
wf-impl-075-outbound-rpc-error-taxonomy

Conversation

@toddysm
Copy link
Copy Markdown
Owner

@toddysm toddysm commented Jun 1, 2026

WF-IMPL-075 — Outbound RPC error taxonomy + ActivityResultEnvelope mapping

Closes #486

Locks the structured outbound-RPC error taxonomy that the upcoming production DaprActivityRuntimeClient + DaprConnectorClient adapters (WF-IMPL-076..079) will raise from every outbound HTTP RPC, plus the deterministic mapping into the existing ActivityResultEnvelope shape so the retry decision driver (WF-IMPL-053) sees the same envelope regardless of which transport-layer failure mode produced it.

What landed

New module custos_workflow/clients/_errors.py:

  • OutboundRpcError(ValueError) base + 4 concrete subclasses:
    • OutboundRpcTransportError → kind workflow.client.transport
    • OutboundRpcStatusError → kind workflow.client.status (carries status_code + optional code)
    • OutboundRpcDecodeError → kind workflow.client.decode
    • OutboundRpcCancelledError → kind workflow.client.cancelled
  • LOCKED_OUTBOUND_RPC_KINDS frozenset + LOCKED_OUTBOUND_RPC_KIND_TO_STATUS MappingProxyType — closed taxonomy with a class-definition guard (__init_subclass__) that rejects any subclass declaring an unknown kind, so a fifth bucket cannot land silently.
  • map_to_activity_envelope(exc, *, attempt) -> ActivityResultEnvelope:
    • Transport → class_="retryable"
    • Status 408 / 429 / 5xx → class_="retryable"
    • Status other 4xx → class_="permanent"
    • Decode → class_="permanent"
    • Cancelled → class_="cancelled"
    • Surfaces status_code + optional echoed code under details; walks __cause__ chain up to MAX_CAUSE_DEPTH (=3) per ARM design Error Envelope spec.

clients/connector.py:

  • New ConnectorBindError(OutboundRpcError) marker base for bind-call context wrapping. Deliberately omitted from __all__ — the adapter imports it via a fully-qualified path so this module's public surface doesn't gain a new name, and it cannot collide with the existing StepCoordinatorError surface at custos_workflow.steps.errors.ConnectorBindError (different layer, different base).

Tests tests/clients/test_errors.py (37 tests, all passing):

  • Exhaustiveness guards: locked kinds frozenset matches status-map keys exactly; cardinality pinned at 4; namespace pinned to workflow.client.*; every concrete subclass surfaces a locked kind.
  • Subclass-definition guard: unknown-kind subclass rejected at class definition; marker subclass (no kind) allowed.
  • Constructor invariants for every subclass; status_code rejected outside [100, 599].
  • map_to_activity_envelope parametrised over HTTP 400 / 401 / 403 / 404 / 408 / 422 / 429 / 500 / 502 / 503 / 504 / 399 / 599 with expected bucket.
  • Cause-chain preservation: no-cause omits field; single cause preserved; depth-5 chain truncated at MAX_CAUSE_DEPTH.
  • Envelope immutability + attempt invariants verified.

Quality gates

  • ruff check . --fix + ruff format --check . — clean
  • mypy src tests (strict) — clean
  • pytest -q --hypothesis-seed=0:
    • 100% line coverage on clients/_errors.py
    • Full workflow-service suite: 1744 passed, 1 pre-existing flake (tests/test_observability.py::test_module_imports_under_noop_providers — also fails on main with the same subprocess ModuleNotFoundError: custos_workflow, unrelated to this PR).

Design alignment

  • ARM design § Error Envelope & Exit Codescause max depth 3, class ∈ {retryable, permanent, cancelled, success}.
  • Workflow Service design § Retry Policy — retry driver dispatches on envelope class_ only; this PR guarantees every outbound-RPC failure mode is rendered into that shape.

Follow-ups

  • WF-IMPL-076 will consume map_to_activity_envelope from the production DaprActivityRuntimeClient.schedule_activity adapter.
  • WF-IMPL-078 will consume it from the production DaprConnectorClient.bind_for_step adapter and may add concrete subclasses of ConnectorBindError (via multiple inheritance with the concrete OutboundRpcError buckets) to attach bind-call context without inventing a fifth bucket.

…elope mapping

Locks the structured outbound-RPC error taxonomy that the upcoming
production DaprActivityRuntimeClient + DaprConnectorClient adapters
(WF-IMPL-076..079) will raise from every outbound HTTP RPC, plus the
deterministic mapping into the existing ActivityResultEnvelope shape
so the retry decision driver (WF-IMPL-053) sees the same envelope
regardless of which transport-layer failure mode produced it.

New module custos_workflow/clients/_errors.py:

* OutboundRpcError(ValueError) base + 4 concrete subclasses:
  - OutboundRpcTransportError    workflow.client.transport
  - OutboundRpcStatusError       workflow.client.status   (status_code+code)
  - OutboundRpcDecodeError       workflow.client.decode
  - OutboundRpcCancelledError    workflow.client.cancelled
* LOCKED_OUTBOUND_RPC_KINDS frozenset + LOCKED_OUTBOUND_RPC_KIND_TO_STATUS
  MappingProxyType — closed taxonomy with class-definition guard that
  rejects any subclass declaring an unknown kind.
* map_to_activity_envelope(exc, *, attempt) -> ActivityResultEnvelope:
  - Transport            -> retryable
  - Status 408/429/5xx   -> retryable
  - Status other 4xx     -> permanent
  - Decode               -> permanent
  - Cancelled            -> cancelled
  Preserves status_code + echoed code under details, walks __cause__
  chain up to MAX_CAUSE_DEPTH (=3) per ARM design Error Envelope spec.

clients/connector.py:

* New ConnectorBindError(OutboundRpcError) marker base for bind-call
  context wrapping. Deliberately omitted from __all__ — the adapter
  imports it via fully-qualified path so this module's public surface
  doesn't gain a new name and it cannot collide with the existing
  StepCoordinatorError surface at custos_workflow.steps.errors.

tests/clients/test_errors.py (37 tests):

* Exhaustiveness guards: locked kinds frozenset matches status-map keys
  exactly; cardinality pinned at 4; namespace pinned to workflow.client.*;
  every concrete subclass surfaces a locked kind.
* Subclass-definition guard: unknown-kind subclass rejected; marker
  subclass (no kind) allowed.
* Constructor invariants for every subclass; status_code rejected
  outside [100, 599].
* map_to_activity_envelope parametrised over HTTP 400/401/403/404/408/
  422/429/500/502/503/504/399/599 with expected bucket.
* Cause-chain preservation: no-cause omits field; single cause preserved;
  depth-5 chain truncated at MAX_CAUSE_DEPTH.
* Envelope immutability + attempt invariants verified.

Quality: ruff + mypy --strict clean; 100% line coverage on _errors.py;
full workflow-service suite 1744 passed + 1 pre-existing flake
(tests/test_observability.py::test_module_imports_under_noop_providers,
also fails on main with the same subprocess ModuleNotFoundError —
unrelated to this PR).

Closes #486
Copilot AI review requested due to automatic review settings June 1, 2026 04:32
@toddysm toddysm added type:implementation Implementation work item phase:implementation Implementation phase component:workflow-service Workflow Service component labels Jun 1, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a closed outbound RPC error taxonomy for workflow-service client adapters and maps those failures into ActivityResultEnvelope so retry handling can consume a consistent error shape.

Changes:

  • Adds custos_workflow.clients._errors with four locked outbound RPC error classes, kind/status metadata, cause-chain rendering, and envelope mapping.
  • Adds a client-layer ConnectorBindError marker for future connector adapter context wrapping.
  • Adds exhaustive unit tests for taxonomy locking, constructor invariants, status classification, cause truncation, and envelope invariants.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/services/workflow-service/src/custos_workflow/clients/_errors.py Defines the outbound RPC error hierarchy and maps errors to ActivityResultEnvelope.
src/services/workflow-service/src/custos_workflow/clients/connector.py Adds the connector-bind client-layer marker error.
src/services/workflow-service/tests/clients/test_errors.py Covers the new taxonomy and envelope mapping behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/services/workflow-service/src/custos_workflow/clients/_errors.py Outdated
The OutboundRpcError docstring cited steps.errors.ConnectorBindError as
an example of ValueError-catching adapter code that stays compatible.
That class actually inherits from StepCoordinatorError(RuntimeError), so
the example was misleading. Removed the example; the rationale (subclass
ValueError to play nicely with generic ValueError catchers) is unchanged.
@toddysm toddysm merged commit cf3d113 into main Jun 1, 2026
23 checks passed
@toddysm toddysm deleted the wf-impl-075-outbound-rpc-error-taxonomy branch June 1, 2026 04:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:workflow-service Workflow Service component phase:implementation Implementation phase type:implementation Implementation work item

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WF-IMPL-075: Outbound RPC error taxonomy + ActivityResultEnvelope mapping

2 participants