WF-IMPL-075: Outbound RPC error taxonomy + ActivityResultEnvelope mapping (#486)#498
Merged
Merged
Conversation
…elope mapping Locks the structured outbound-RPC error taxonomy that the upcoming production DaprActivityRuntimeClient + DaprConnectorClient adapters (WF-IMPL-076..079) will raise from every outbound HTTP RPC, plus the deterministic mapping into the existing ActivityResultEnvelope shape so the retry decision driver (WF-IMPL-053) sees the same envelope regardless of which transport-layer failure mode produced it. New module custos_workflow/clients/_errors.py: * OutboundRpcError(ValueError) base + 4 concrete subclasses: - OutboundRpcTransportError workflow.client.transport - OutboundRpcStatusError workflow.client.status (status_code+code) - OutboundRpcDecodeError workflow.client.decode - OutboundRpcCancelledError workflow.client.cancelled * LOCKED_OUTBOUND_RPC_KINDS frozenset + LOCKED_OUTBOUND_RPC_KIND_TO_STATUS MappingProxyType — closed taxonomy with class-definition guard that rejects any subclass declaring an unknown kind. * map_to_activity_envelope(exc, *, attempt) -> ActivityResultEnvelope: - Transport -> retryable - Status 408/429/5xx -> retryable - Status other 4xx -> permanent - Decode -> permanent - Cancelled -> cancelled Preserves status_code + echoed code under details, walks __cause__ chain up to MAX_CAUSE_DEPTH (=3) per ARM design Error Envelope spec. clients/connector.py: * New ConnectorBindError(OutboundRpcError) marker base for bind-call context wrapping. Deliberately omitted from __all__ — the adapter imports it via fully-qualified path so this module's public surface doesn't gain a new name and it cannot collide with the existing StepCoordinatorError surface at custos_workflow.steps.errors. tests/clients/test_errors.py (37 tests): * Exhaustiveness guards: locked kinds frozenset matches status-map keys exactly; cardinality pinned at 4; namespace pinned to workflow.client.*; every concrete subclass surfaces a locked kind. * Subclass-definition guard: unknown-kind subclass rejected; marker subclass (no kind) allowed. * Constructor invariants for every subclass; status_code rejected outside [100, 599]. * map_to_activity_envelope parametrised over HTTP 400/401/403/404/408/ 422/429/500/502/503/504/399/599 with expected bucket. * Cause-chain preservation: no-cause omits field; single cause preserved; depth-5 chain truncated at MAX_CAUSE_DEPTH. * Envelope immutability + attempt invariants verified. Quality: ruff + mypy --strict clean; 100% line coverage on _errors.py; full workflow-service suite 1744 passed + 1 pre-existing flake (tests/test_observability.py::test_module_imports_under_noop_providers, also fails on main with the same subprocess ModuleNotFoundError — unrelated to this PR). Closes #486
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces a closed outbound RPC error taxonomy for workflow-service client adapters and maps those failures into ActivityResultEnvelope so retry handling can consume a consistent error shape.
Changes:
- Adds
custos_workflow.clients._errorswith four locked outbound RPC error classes, kind/status metadata, cause-chain rendering, and envelope mapping. - Adds a client-layer
ConnectorBindErrormarker for future connector adapter context wrapping. - Adds exhaustive unit tests for taxonomy locking, constructor invariants, status classification, cause truncation, and envelope invariants.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/services/workflow-service/src/custos_workflow/clients/_errors.py |
Defines the outbound RPC error hierarchy and maps errors to ActivityResultEnvelope. |
src/services/workflow-service/src/custos_workflow/clients/connector.py |
Adds the connector-bind client-layer marker error. |
src/services/workflow-service/tests/clients/test_errors.py |
Covers the new taxonomy and envelope mapping behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The OutboundRpcError docstring cited steps.errors.ConnectorBindError as an example of ValueError-catching adapter code that stays compatible. That class actually inherits from StepCoordinatorError(RuntimeError), so the example was misleading. Removed the example; the rationale (subclass ValueError to play nicely with generic ValueError catchers) is unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
WF-IMPL-075 — Outbound RPC error taxonomy +
ActivityResultEnvelopemappingCloses #486
Locks the structured outbound-RPC error taxonomy that the upcoming production
DaprActivityRuntimeClient+DaprConnectorClientadapters (WF-IMPL-076..079) will raise from every outbound HTTP RPC, plus the deterministic mapping into the existingActivityResultEnvelopeshape so the retry decision driver (WF-IMPL-053) sees the same envelope regardless of which transport-layer failure mode produced it.What landed
New module
custos_workflow/clients/_errors.py:OutboundRpcError(ValueError)base + 4 concrete subclasses:OutboundRpcTransportError→ kindworkflow.client.transportOutboundRpcStatusError→ kindworkflow.client.status(carriesstatus_code+ optionalcode)OutboundRpcDecodeError→ kindworkflow.client.decodeOutboundRpcCancelledError→ kindworkflow.client.cancelledLOCKED_OUTBOUND_RPC_KINDSfrozenset +LOCKED_OUTBOUND_RPC_KIND_TO_STATUSMappingProxyType— closed taxonomy with a class-definition guard (__init_subclass__) that rejects any subclass declaring an unknown kind, so a fifth bucket cannot land silently.map_to_activity_envelope(exc, *, attempt) -> ActivityResultEnvelope:class_="retryable"class_="retryable"class_="permanent"class_="permanent"class_="cancelled"status_code+ optional echoedcodeunderdetails; walks__cause__chain up toMAX_CAUSE_DEPTH(=3) per ARM design Error Envelope spec.clients/connector.py:ConnectorBindError(OutboundRpcError)marker base for bind-call context wrapping. Deliberately omitted from__all__— the adapter imports it via a fully-qualified path so this module's public surface doesn't gain a new name, and it cannot collide with the existingStepCoordinatorErrorsurface atcustos_workflow.steps.errors.ConnectorBindError(different layer, different base).Tests
tests/clients/test_errors.py(37 tests, all passing):workflow.client.*; every concrete subclass surfaces a locked kind.status_coderejected outside[100, 599].map_to_activity_envelopeparametrised over HTTP 400 / 401 / 403 / 404 / 408 / 422 / 429 / 500 / 502 / 503 / 504 / 399 / 599 with expected bucket.MAX_CAUSE_DEPTH.Quality gates
ruff check . --fix+ruff format --check .— cleanmypy src tests(strict) — cleanpytest -q --hypothesis-seed=0:clients/_errors.pytests/test_observability.py::test_module_imports_under_noop_providers— also fails onmainwith the same subprocessModuleNotFoundError: custos_workflow, unrelated to this PR).Design alignment
causemax depth 3,class∈ {retryable, permanent, cancelled, success}.class_only; this PR guarantees every outbound-RPC failure mode is rendered into that shape.Follow-ups
map_to_activity_envelopefrom the productionDaprActivityRuntimeClient.schedule_activityadapter.DaprConnectorClient.bind_for_stepadapter and may add concrete subclasses ofConnectorBindError(via multiple inheritance with the concreteOutboundRpcErrorbuckets) to attach bind-call context without inventing a fifth bucket.