Skip to content

Improve Orion Notification Error Logs (Missing Root Cause) #4735

@tzzed

Description

@tzzed

Is your feature request related to a problem / use case? Please describe.
When Orion sends a notification and the receiver returns an HTTP 500, the log only shows:

msg=Notification (subId: 6930585843c986004a0bfc00) response NOT OK, http code: 500
The problem is that the root cause of the error is not logged.
This makes troubleshooting difficult — we don’t know whether the issue comes from the payload, headers, the remote endpoint, a timeout, or something else.

Describe the solution you'd like
When receiving a non-OK HTTP response (4xx or 5xx), Orion should log additional details such as:

  • the error message returned by the endpoint,

  • the response body (if available)

  • optionally, the response time (to identify timeouts)

Example of desired log output:

msg=Notification (subId: 6930585843c986004a0bfc00) NOT OK,
http code: 500,
error: "Internal error: invalid payload format",
responseBody: "{...}"

Describe alternatives you've considered
Increasing log verbosity on the receiver side
This helps identify the error but does not solve the lack of visibility in Orion itself. It also requires access to the remote service, which is not always possible.

Using a proxy or middleware to capture the full HTTP exchange
While this can expose the error details, it adds complexity to the architecture and is not practical for production environments.

Manually reproducing the failing request using tools like cURL or Postman
This can give insights, but it is time-consuming and does not help during real-time operations or when multiple notifications fail.

Enabling debug logs globally in Orion
This generates too much noise and still does not guarantee that the HTTP response body or error message will be included.

Describe why you need this feature
When a notification fails, Orion currently logs only the HTTP status code. Without the underlying error message or response body, it becomes extremely difficult to understand why the receiver rejected the notification.
This lack of visibility slows down debugging, increases operational overhead, and requires teams to inspect multiple systems just to identify the root cause.
Having the actual error returned by the receiver directly in Orion’s logs would:
significantly reduce troubleshooting time
help detect malformed payloads or configuration issues faster
improve reliability and monitoring of the notification pipeline
provide clearer insights during incidents and production debugging
In short, this feature makes Orion much easier to operate in real-world IoT deployments where notifications are frequent and critical.

Additional information
Below is an example of the current Orion log output, which does not include any details about the underlying error:

msg=Notification (subId: 6930585843c986004a0bfc00) response NOT OK, http code: 500

In production environments, this message alone is not sufficient to diagnose issues, especially when the receiving service returns meaningful error details (e.g., invalid payload, missing headers, authentication errors).

A more informative log entry — including the response body and error message — would greatly improve observability. For instance:

msg=Notification (subId: 6930585843c986004a0bfc00) NOT OK,
http code: 500,
error: "Internal error: invalid JSON payload",
responseBody: "{ \"error\": \"Schema mismatch\" }"

This enhancement aligns with best practices in distributed systems and increases operational transparency, especially in IoT setups where large volumes of notifications must be monitored.

Do you have the intention to implement the solution

  • Yes, I have the knowledge to implement this new feature.
  • Yes, but I will need help.
  • No, I do not have the skills.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions