Skip to content

More cleanup related to error logging#9942

Open
sruggier wants to merge 7 commits intooxidecomputer:mainfrom
sruggier:pr/error-logging-fixes
Open

More cleanup related to error logging#9942
sruggier wants to merge 7 commits intooxidecomputer:mainfrom
sruggier:pr/error-logging-fixes

Conversation

@sruggier
Copy link
Contributor

This is a bit of a grab bag of error logging related fixes, carefully ordered (as described in #9804) to err on the side of potentially logging source error messages multiple times, rather than not at all.

This is related to #9804, but not yet close to fixing every inconsistency in how errors are logged.

This change updates all handlers of omicron_sled_agent::Error to use the
InlineErrorChain adapter, then removes all duplicate source error
messages from its Display implementation.

There are some variants that include the source chain in Display output,
but leave the source unset. That arrangement doesn't result in any
duplicate log output, so it's more graceful to defer updating those
variants into the commit that performs the same transition on the
corresponding source error type.
It looks like all code paths that handle ConfigError are already
attaching it to error types that will log the full chain of sources, so
there's nothing else to do for this type.
Previously, only the next level down would be logged, in case of
NodeRequestError::Fsm and NodeRequestError::Recv. This commit links
bootstore::NodeRequestError as a source, instead of formatting it as a
string, which drops the rest of the chain.
This commit updates handle_start_sled_agent_request to log the full
error chain in case a SledAgentServerStartError is returned. Previously,
one level of cause would be logged for the FailedLearnerInit variant,
and CommitToLedger errors were logged without any information about the
source.
The main consumer of this error type seems to be
omicron_sled_agent::sled_agent::Error, so this commit has the effect of
fixing duplication in error logging that was introduced by a previous
commit within the same PR.
This change modifies all of the code paths that handle
omicron_sled_agent::instance::Error so they use the InlineErrorChain
adapter before logging error messages, and then updates the enum
definition to avoid duplicate error emission from its `Display`
implementation.
Similar to previous commits, this updates all handlers of
omicron_sled_agent::services::Error such that they recurse into sources
instead of relying on the duplication of source error messages in its
`Display` implementation. It then also removes all duplicated error
strings from its `Display` implementation, defining sources instead,
where applicable.
@sruggier sruggier force-pushed the pr/error-logging-fixes branch from 95e1b2a to c0bb317 Compare February 28, 2026 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant