-
Notifications
You must be signed in to change notification settings - Fork 8.9k
feature: add metrics support for NamingServer #7882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 2.x
Are you sure you want to change the base?
feature: add metrics support for NamingServer #7882
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## 2.x #7882 +/- ##
============================================
+ Coverage 71.58% 71.62% +0.03%
Complexity 883 883
============================================
Files 1294 1294
Lines 49554 49554
Branches 5884 5884
============================================
+ Hits 35475 35493 +18
+ Misses 11155 11142 -13
+ Partials 2924 2919 -5 🚀 New features to boost your workflow:
|
funky-eyes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, I suggest adding monitoring metrics for the response time (avg, p50, p99, p999) and QPS of each interface. You can implement this in the filter.
...ngserver/src/main/java/org/apache/seata/namingserver/metrics/NamingServerMetricsManager.java
Show resolved
Hide resolved
namingserver/src/main/java/org/apache/seata/namingserver/manager/NamingManager.java
Show resolved
Hide resolved
…rver' into add-metrics-support-for-NamingServer
…pNamingMetricsManager and PrometheusNamingMetricsManager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we're missing some common tags, such as the host of the current Naming Server. For example, this would make it easier to query metrics from specific Naming Server instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive Prometheus metrics support to the NamingServer module. The implementation enables monitoring of critical server operations through Micrometer and Spring Boot Actuator.
Key Changes:
- Introduces three core metrics: cluster node count (Gauge), watcher count (Gauge), and cluster change push notifications (Counter)
- Adds configurable metrics support with a no-op implementation for when metrics are disabled
- Integrates metrics collection into existing server operations for real-time monitoring
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
namingserver/pom.xml |
Adds Spring Boot Actuator and Micrometer Prometheus registry dependencies |
namingserver/src/main/resources/application.yml |
Configures metrics endpoint exposure and percentile distribution settings |
namingserver/src/test/resources/application.yml |
Configures test environment metrics settings |
namingserver/src/main/java/org/apache/seata/namingserver/metrics/NamingServerMetricsManager.java |
Defines the metrics manager interface with metric names and tag constants |
namingserver/src/main/java/org/apache/seata/namingserver/metrics/PrometheusNamingMetricsManager.java |
Implements Prometheus metrics collection using MultiGauge and Counter |
namingserver/src/main/java/org/apache/seata/namingserver/metrics/NoOpNamingMetricsManager.java |
Provides no-op implementation when metrics are disabled |
namingserver/src/main/java/org/apache/seata/namingserver/metrics/NamingServerTagsContributor.java |
Adds custom tags to HTTP request metrics for business dimensions |
namingserver/src/main/java/org/apache/seata/namingserver/manager/NamingManager.java |
Integrates cluster node count metrics refresh on instance registration/unregistration |
namingserver/src/main/java/org/apache/seata/namingserver/manager/ClusterWatcherManager.java |
Integrates watcher count metrics and cluster change push counter |
namingserver/src/test/java/org/apache/seata/namingserver/NamingServerMetricsManagerTest.java |
Adds comprehensive unit tests for metrics manager functionality |
namingserver/src/test/java/org/apache/seata/namingserver/ClusterWatcherManagerTest.java |
Updates tests to inject NoOpNamingMetricsManager |
changes/en-us/2.x.md |
Documents the feature addition in English changelog |
changes/zh-cn/2.x.md |
Documents the feature addition in Chinese changelog |
Comments suppressed due to low confidence (1)
namingserver/src/main/java/org/apache/seata/namingserver/manager/ClusterWatcherManager.java:78
- The scheduled task re-registers watchers within its execution which calls registryWatcher, and registryWatcher immediately triggers refreshWatcherCountMetrics. This creates a potential for frequent metrics refreshes every second as the scheduled task runs. This could cause unnecessary overhead when combined with the refresh on line 102.
scheduledThreadPoolExecutor.scheduleAtFixedRate(
() -> {
for (String group : WATCHERS.keySet()) {
Optional.ofNullable(WATCHERS.remove(group))
.ifPresent(watchers -> watchers.parallelStream().forEach(watcher -> {
if (System.currentTimeMillis() >= watcher.getTimeout()) {
notify(watcher, HttpStatus.NOT_MODIFIED.value());
}
if (!watcher.isDone()) {
// Re-register
registryWatcher(watcher);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Add namespace tag | ||
| String namespace = context.getCarrier().getParameter(TAG_NAMESPACE); | ||
| keyValues = keyValues.and(KeyValue.of(TAG_NAMESPACE, namespace != null ? namespace : UNKNOWN)); | ||
|
|
||
| // Add cluster tag | ||
| String cluster = context.getCarrier().getParameter(TAG_CLUSTER); | ||
| if (cluster == null) { | ||
| cluster = context.getCarrier().getParameter("clusterName"); | ||
| } | ||
| keyValues = keyValues.and(KeyValue.of(TAG_CLUSTER, cluster != null ? cluster : UNKNOWN)); | ||
|
|
||
| // Add vgroup tag | ||
| String vgroup = context.getCarrier().getParameter(TAG_VGROUP); | ||
| if (vgroup == null) { | ||
| vgroup = context.getCarrier().getParameter("group"); | ||
| } | ||
| keyValues = keyValues.and(KeyValue.of(TAG_VGROUP, vgroup != null ? vgroup : UNKNOWN)); |
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding tags with value "unknown" for every HTTP request will create high cardinality metrics. When these parameters are not present in requests, it would be better to either not add the tag at all, or use a constant shared "unknown" value. This could lead to performance issues and excessive memory usage in the metrics registry as each unique combination of tag values creates a separate time series. Consider only adding tags when they have meaningful values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "unknown" value is a fixed string constant, not a dynamic value that changes per request. This won't cause cardinality explosion. The namespace, cluster, and vgroup parameters are also low-cardinality by nature in naming server scenarios (limited to dozens at most). Using "unknown" as a fallback is a standard practice that helps operators identify requests with missing parameters.
namingserver/src/main/java/org/apache/seata/namingserver/manager/ClusterWatcherManager.java
Show resolved
Hide resolved
namingserver/src/main/java/org/apache/seata/namingserver/manager/NamingManager.java
Show resolved
Hide resolved
...gserver/src/main/java/org/apache/seata/namingserver/metrics/NamingServerTagsContributor.java
Outdated
Show resolved
Hide resolved
namingserver/src/main/java/org/apache/seata/namingserver/manager/NamingManager.java
Show resolved
Hide resolved
namingserver/src/main/java/org/apache/seata/namingserver/manager/NamingManager.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <[email protected]>
…ss) to prevent loading console module's WebSecurityConfig which causes ClassNotFoundException for WebSecurityConfigurerAdapter on Spring Security 6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...gserver/src/main/java/org/apache/seata/namingserver/metrics/NamingServerTagsContributor.java
Show resolved
Hide resolved
...rver/src/main/java/org/apache/seata/namingserver/metrics/PrometheusNamingMetricsManager.java
Show resolved
Hide resolved
...ngserver/src/main/java/org/apache/seata/namingserver/metrics/NamingServerMetricsManager.java
Show resolved
Hide resolved
namingserver/src/test/java/org/apache/seata/namingserver/NamingServerMetricsManagerTest.java
Show resolved
Hide resolved
…mingMetricsManager.java
Both implemented. Response time metrics (P50/P90/P95/P99/P999) use Spring Boot Actuator's http.server.requests with custom business tags via NamingServerTagsContributor. Common tag application added in application.yml. |
|
|
||
| @Override | ||
| public KeyValues getLowCardinalityKeyValues(ServerRequestObservationContext context) { | ||
| KeyValues keyValues = super.getLowCardinalityKeyValues(context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only applicable to requests that include a namespace and vgroup? If so, how should the tags be added in the PrometheusNamingMetricsManager class? For example, when a vgroup under a specific namespace's cluster performs a watch operation, or when fetching cluster information.
| watchers.parallelStream().forEach(this::notify); | ||
| // Increment cluster change push counter | ||
| if (!watchers.isEmpty()) { | ||
| metricsManager.incrementClusterChangePushCount(event.getGroup()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, only the "group" is included. Is it possible to also include the namespace and cluster information as tags?
| percentiles: | ||
| http.server.requests: 0.5, 0.9, 0.95, 0.99, 0.999 | ||
| percentiles-histogram: | ||
| http.server.requests: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| percentiles: | |
| http.server.requests: 0.5, 0.9, 0.95, 0.99, 0.999 | |
| percentiles-histogram: | |
| http.server.requests: true | |
| percentiles: | |
| http.server.requests: 0.5, 0.99, 0.999 |
I believe that for now, these three percentiles should suffice, and we should not enable percentiles-histogram, as it would result in excessive data being generated in Prometheus.
Ⅰ. Describe what this PR did
Add metrics for NamingServer:
Ⅱ. Does this pull request fix one issue?
fixes #7852
Ⅲ. Why don't you add test cases (unit test/integration test)?
Ⅳ. Describe how to verify it
Ⅴ. Special notes for reviews