Skip to content

feat(monitoring): add dashboard templates for service metrics#440

Open
ladinoraa wants to merge 1 commit into
PrincessnJoy:mainfrom
ladinoraa:feat/monitoring-dashboards-323
Open

feat(monitoring): add dashboard templates for service metrics#440
ladinoraa wants to merge 1 commit into
PrincessnJoy:mainfrom
ladinoraa:feat/monitoring-dashboards-323

Conversation

@ladinoraa

Copy link
Copy Markdown
Contributor

Summary

Closes #323

Adds a monitoring/ directory with ready-to-use Grafana dashboard and Prometheus alerting rules so operators can get visibility into deployed service health without building dashboards from scratch.

Changes

monitoring/grafana-dashboard.json

Grafana dashboard (schema v38) with 7 panels:

  • RPC Latency — time series of p50/p95/p99 latency
  • Event Polling Rate — events polled and processed per minute
  • Error Rate — RPC and poll error rates as percentages
  • Active Proposals — stat panel
  • Total Votes Cast — stat panel
  • RPC Uptime — 24h uptime gauge
  • Event Polling Lag — seconds behind chain tip gauge

Includes a network template variable (testnet/mainnet) to filter all panels.

monitoring/alerts.yml

Prometheus alerting rules:

  • HighRpcLatency — p95 > 2 s for 5 min (warning)
  • HighRpcErrorRate — error rate > 5% for 5 min (critical)
  • EventPollingLag — lag > 300 s for 2 min (warning)
  • RpcTargetDown — target unreachable for 1 min (critical)

monitoring/README.md

  • Full metrics reference table (name, type, labels, description)
  • Import instructions for Grafana UI and provisioning
  • Prometheus rules integration (standalone and Prometheus Operator)
  • Alerting threshold summary table

…ssnJoy#323)

- Add monitoring/grafana-dashboard.json: Grafana dashboard with panels for
  RPC latency (p50/p95/p99), event polling rate, error rate, active proposals,
  votes cast, RPC uptime, and event polling lag
- Add monitoring/alerts.yml: Prometheus alerting rules for high latency,
  high error rate, event polling lag, and RPC target down
- Add monitoring/README.md: metrics reference table, instructions for
  connecting dashboards via UI import and provisioning, alert thresholds

Closes PrincessnJoy#323
@drips-wave

drips-wave Bot commented Jun 26, 2026

Copy link
Copy Markdown

@ladinoraa Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add monitoring dashboard templates for service metrics

1 participant