Skip to content

Conversation

@YvCeung
Copy link
Contributor

@YvCeung YvCeung commented Dec 25, 2025

Ⅰ. Describe what this PR did

Core Changes:

  1. Server timeout handling optimization: HTTP2 connections no longer timeout and close automatically; they maintain long connections. Only connection activity is checked, and resources are cleaned up when disconnected.
  2. HTTP2 long connection mechanism: Supports pushing cluster change events via HTTP2 data frames using SSE format, with event types including keepalive and cluster-update.
    Highlights:
  • HTTP2 connections immediately send keepalive event upon establishment
  • Push cluster-update event when cluster changes occur
  • HTTP2 connections automatically re-register after receiving events to continue listening
  • HTTP1 long polling remains backward compatible

Ⅱ. Does this pull request fix one issue?

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

Unit Tests:

org.apache.seata.server.controller.ClusterControllerTest#watchTimeoutTest_withHttp2
org.apache.seata.server.controller.ClusterControllerTest#watch_stream

Verification Points:

  • HTTP2 connection receives keepalive event immediately after establishment
  • HTTP2 connection remains open after timeout period (no timeout event received)
  • cluster-update event is received promptly when cluster changes occur
  • HTTP1 long polling works correctly (returns 304 on timeout)

Ⅴ. Special notes for reviews

Design Decisions:

  • No timeout for HTTP2: HTTP2 supports multiplexing with low overhead for long connections, provides better real-time performance, and clients continuously listen via SeataHttpWatch.
  • HTTP1 keeps original logic: Backward compatible, no impact on existing HTTP1 clients.

Key Code Locations:

  • ClusterWatcherManager.init(): Timeout handling logic (distinguishes HTTP1/HTTP2)
  • ClusterWatcherManager.notifyWatcher(): HTTP2 re-registration logic
  • ClusterWatcherManager.sendWatcherResponse(): HTTP2 data frame pushing

Notes:

  • Test cases have been updated with timeout protection to avoid infinite loops caused by long connections
  • HTTP2 connection resources are automatically cleaned up when disconnected
  • No impact on existing HTTP1 clients

@codecov
Copy link

codecov bot commented Dec 29, 2025

Codecov Report

❌ Patch coverage is 82.15962% with 38 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.61%. Comparing base (6160c7d) to head (3021107).

Files with missing lines Patch % Lines
.../server/cluster/manager/ClusterWatcherManager.java 78.94% 8 Missing and 12 partials ⚠️
...a/org/apache/seata/common/util/HttpClientUtil.java 63.63% 9 Missing and 3 partials ⚠️
...a/org/apache/seata/common/util/SeataHttpWatch.java 92.95% 0 Missing and 5 partials ⚠️
...pache/seata/common/metadata/ClusterWatchEvent.java 92.85% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##                2.x    #7903      +/-   ##
============================================
+ Coverage     71.58%   71.61%   +0.02%     
- Complexity      883      884       +1     
============================================
  Files          1294     1296       +2     
  Lines         49554    49704     +150     
  Branches       5884     5913      +29     
============================================
+ Hits          35475    35594     +119     
- Misses        11155    11172      +17     
- Partials       2924     2938      +14     
Files with missing lines Coverage Δ
...c/main/java/org/apache/seata/common/Constants.java 100.00% <ø> (ø)
...pache/seata/common/metadata/ClusterWatchEvent.java 92.85% <92.85%> (ø)
...a/org/apache/seata/common/util/SeataHttpWatch.java 92.95% <92.95%> (ø)
...a/org/apache/seata/common/util/HttpClientUtil.java 80.29% <63.63%> (-4.44%) ⬇️
.../server/cluster/manager/ClusterWatcherManager.java 79.38% <78.94%> (-3.67%) ⬇️

... and 9 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@YvCeung YvCeung marked this pull request as ready for review December 30, 2025 05:39
Copilot AI review requested due to automatic review settings December 30, 2025 05:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds HTTP/2 Server-Sent Events (SSE) support for the Watch API in Server Raft mode, replacing the previous callback-based HTTP/2 implementation with a more standard streaming approach.

Key changes:

  • Introduced SeataHttpWatch class providing an iterator-based API for consuming SSE streams
  • Replaced callback-based HTTP/2 methods with new watch() and watchPost() methods in HttpClientUtil
  • Updated server-side ClusterWatcherManager to send SSE-formatted events over HTTP/2 with proper stream management

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 22 comments.

Show a summary per file
File Description
common/src/main/java/org/apache/seata/common/util/SeataHttpWatch.java New class implementing HTTP/2 SSE stream consumption with iterator pattern
common/src/main/java/org/apache/seata/common/util/HttpClientUtil.java Removed HTTP/2 callback methods, added watch methods for SSE streaming
common/src/main/java/org/apache/seata/common/metadata/ClusterWatchEvent.java New DTO class for cluster watch events
common/src/main/java/org/apache/seata/common/executor/HttpCallback.java Deleted callback interface (no longer needed)
server/src/main/java/org/apache/seata/server/cluster/manager/ClusterWatcherManager.java Refactored to send SSE events over HTTP/2 with keepalive support
server/src/test/java/org/apache/seata/server/controller/ClusterControllerTest.java Updated tests to use new watch API instead of callbacks
server/src/test/java/org/apache/seata/server/cluster/manager/ClusterWatcherManagerTest.java Removed HTTP/2-specific tests
common/src/test/java/org/apache/seata/common/util/SeataHttpWatchTest.java Comprehensive unit tests for SeataHttpWatch class
common/src/test/java/org/apache/seata/common/util/HttpClientUtilTest.java Updated with tests for new watch methods
changes/zh-cn/2.x.md Added Chinese changelog entry
changes/en-us/2.x.md Added English changelog entry

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@YvCeung YvCeung added module/common common module module/server server module type: feature Category issues or prs related to feature request. labels Dec 30, 2025
@funky-eyes funky-eyes changed the title feature:Support HTTP/2 stream push for the Watch API in Server Raft mode feature: Support HTTP/2 stream push for the Watch API in Server Raft mode Dec 30, 2025
@YvCeung
Copy link
Contributor Author

YvCeung commented Jan 2, 2026

@funky-eyes PTAL

@funky-eyes funky-eyes added this to the 2.6.0 milestone Jan 4, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 72 to 93
for (String group : WATCHERS.keySet()) {
Optional.ofNullable(WATCHERS.remove(group))
.ifPresent(watchers -> watchers.parallelStream().forEach(watcher -> {
if (System.currentTimeMillis() >= watcher.getTimeout()) {
watcher.setDone(true);
sendWatcherResponse(watcher, HttpResponseStatus.NOT_MODIFIED);
}
if (!watcher.isDone()) {
// Re-register
registryWatcher(watcher);
HttpContext context = watcher.getAsyncContext();
boolean isHttp2 = context instanceof HttpContext && context.isHttp2();
if (isHttp2) {
if (!context.getContext().channel().isActive()) {
watcher.setDone(true);
HTTP2_HEADERS_SENT.remove(watcher);
} else {
registryWatcher(watcher);
}
} else {
if (System.currentTimeMillis() >= watcher.getTimeout()) {
watcher.setDone(true);
sendWatcherResponse(watcher, HttpResponseStatus.NOT_MODIFIED, true, false);
} else if (!watcher.isDone()) {
registryWatcher(watcher);
}
}
}));
}
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scheduled task removes all watchers from a group at once using WATCHERS.remove(group), then processes them. If new watchers are registered for the same group during processing, they could be lost. Consider using a more concurrent-safe approach, such as iterating over watchers without removing the entire queue first, or using a copy-on-write approach.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当前实现是安全的,原因如下:

  1. 新注册的 watcher 不会丢失:registryWatcher 使用 computeIfAbsent,即使队列被 remove,新 watcher 注册时会创建新队列并加入。
  2. 定时任务会定期扫描:每秒执行一次,新 watcher 最多等待 1 秒即可被处理。
  3. 并发场景下:定时任务 remove 后,新 watcher 会进入新队列;定时任务重新注册的 watcher 也会进入同一新队列,下次扫描时一并处理。

对于长轮询场景,1 秒的延迟是可接受的。当前实现已满足需求,无需额外优化。


The current implementation is safe for the following reasons:

  1. New watchers won't be lost: The registryWatcher method uses computeIfAbsent, so even if the queue is removed by WATCHERS.remove(group), newly registered watchers will create a new queue and be added to it.

  2. Scheduled task will scan periodically: The task runs every second, so new watchers will be processed within at most 1 second.

  3. In concurrent scenarios: After the scheduled task removes a queue, new watchers will be added to a new queue created by computeIfAbsent. When the scheduled task re-registers watchers, they will also be added to the same new queue, and all will be processed in the next scan.

For long-polling scenarios, a 1-second delay is acceptable. The current implementation meets the requirements and no additional optimization is needed.

@funky-eyes WDYT?

Comment on lines +309 to +310
// Default to GET if method is not specified or not supported
requestBuilder.get();
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method allows POST and PUT with requestBody, but silently defaults to GET for any unsupported method (including DELETE, PATCH, etc.). This could mask errors when an incorrect method is specified. Consider throwing an IllegalArgumentException for unsupported methods instead of silently defaulting to GET.

Suggested change
// Default to GET if method is not specified or not supported
requestBuilder.get();
throw new IllegalArgumentException("Unsupported HTTP method for watch request: " + method);

Copilot uses AI. Check for mistakes.
// Fast failure during connection phase
.connectTimeout(connectTimeoutSeconds, TimeUnit.SECONDS)
// Infinite read timeout to allow continuous listening for server push
.readTimeout(0, TimeUnit.SECONDS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make the read timeout configurable instead of hardcoding it to 0. When readTimeout is set to 0, the underlying socket timeout becomes infinite, which can cause the request to block indefinitely. In cases where the server shuts down abruptly (e.g., a crash without a graceful TCP handshake), this may lead to permanent blocking.
The connectTimeout should be limited to around 10 seconds, while the readTimeout should be adjustable based on the specific request type.

watcher.setDone(true);
HTTP2_HEADERS_SENT.remove(watcher);
} else {
registryWatcher(watcher);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the cluster state changes again during the interval before re-registering for the subscription queue, is there a risk—in extreme cases—that the notification could be lost?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently through integration testing, if a change event occurs and then another change event occurs within 500ms, the client will not receive the second notification because it has not re-registered. And it has been stable and reproducible. I will solve this problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, optimizations have been made for potential loss issues. Different queues have been created for watching HTTP1 and HTTP2 respectively to store data. HTTP2 does not remove and re-register regularly. The scheduled tasks only check whether the channel is valid to prevent memory leaks.

// Separate watchers for HTTP/1.1 (one-time requests) and HTTP/2 (long-lived connections)
    private static final Map<String, Queue<Watcher<HttpContext>>> HTTP1_WATCHERS = new ConcurrentHashMap<>();
    private static final Map<String, Queue<Watcher<HttpContext>>> HTTP2_WATCHERS = new ConcurrentHashMap<>();

// Check HTTP/2 watchers for connection validity (don't remove, just check)
                    for (Map.Entry<String, Queue<Watcher<HttpContext>>> entry : HTTP2_WATCHERS.entrySet()) {
                        String group = entry.getKey();
                        Queue<Watcher<HttpContext>> watchers = entry.getValue();
                        if (watchers == null || watchers.isEmpty()) {
                            continue;
                        }

                        // Create snapshot to avoid concurrent modification
                        List<Watcher<HttpContext>> watchersToCheck = new ArrayList<>(watchers);

                        watchersToCheck.forEach(watcher -> {
                            HttpContext context = watcher.getAsyncContext();
                            if (!context.getContext().channel().isActive()) {
                                // Remove invalid watcher
                                watchers.remove(watcher);
                                watcher.setDone(true);
                                HTTP2_HEADERS_SENT.remove(watcher);
                                logger.debug("Removed inactive HTTP/2 watcher for group: {}", group);
                            }
                        });
                    }
                },

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, optimizations have been made for potential loss issues. Different queues have been created for watching HTTP1 and HTTP2 respectively to store data. HTTP2 does not remove and re-register regularly. The scheduled tasks only check whether the channel is valid to prevent memory leaks.

// Separate watchers for HTTP/1.1 (one-time requests) and HTTP/2 (long-lived connections)
    private static final Map<String, Queue<Watcher<HttpContext>>> HTTP1_WATCHERS = new ConcurrentHashMap<>();
    private static final Map<String, Queue<Watcher<HttpContext>>> HTTP2_WATCHERS = new ConcurrentHashMap<>();

// Check HTTP/2 watchers for connection validity (don't remove, just check)
                    for (Map.Entry<String, Queue<Watcher<HttpContext>>> entry : HTTP2_WATCHERS.entrySet()) {
                        String group = entry.getKey();
                        Queue<Watcher<HttpContext>> watchers = entry.getValue();
                        if (watchers == null || watchers.isEmpty()) {
                            continue;
                        }

                        // Create snapshot to avoid concurrent modification
                        List<Watcher<HttpContext>> watchersToCheck = new ArrayList<>(watchers);

                        watchersToCheck.forEach(watcher -> {
                            HttpContext context = watcher.getAsyncContext();
                            if (!context.getContext().channel().isActive()) {
                                // Remove invalid watcher
                                watchers.remove(watcher);
                                watcher.setDone(true);
                                HTTP2_HEADERS_SENT.remove(watcher);
                                logger.debug("Removed inactive HTTP/2 watcher for group: {}", group);
                            }
                        });
                    }
                },

We also need to consider another issue: since the response is written back using HTTP/2 + SSE, is there any thread safety concern with concurrent writes to the SSE response? Additionally, could there be an out-of-order delivery problem—for example, if two events are generated, with event 1 arriving at the client after event 2, potentially causing the newer event to be overwritten by the older one?
To address this, should we ensure that events are delivered in order and include term validation? Specifically, if the term of a previous event in the response is already greater than that of a subsequent event, the latter should be discarded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that writing data to the ChannelHandlerContext in a concurrent manner does have this possibility, but it does not affect the data push. It's just that several lines of data are pushed at once. As long as the data sent at one time is complete, it's fine. Because the client parses one line at a time. As for the disorder issue of the term, the client can make its own judgment. Of course, the server can also make some appropriate optimizations.

我认为并发往ChannelHandlerContext里面write数据是存在这种可能性,但是不影响数据推送,无非一下子推送了好几行数据,服务端只要确保一次性发出去的数据是完整的就行,因为客户端是一行行解析的。至于term的乱序问题 客户端可以自己判断,当然服务端也可以适当的做一点优化

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that writing data to the ChannelHandlerContext in a concurrent manner does have this possibility, but it does not affect the data push. It's just that several lines of data are pushed at once. As long as the data sent at one time is complete, it's fine. Because the client parses one line at a time. As for the disorder issue of the term, the client can make its own judgment. Of course, the server can also make some appropriate optimizations.

我认为并发往ChannelHandlerContext里面write数据是存在这种可能性,但是不影响数据推送,无非一下子推送了好几行数据,服务端只要确保一次性发出去的数据是完整的就行,因为客户端是一行行解析的。至于term的乱序问题 客户端可以自己判断,当然服务端也可以适当的做一点优化

这一块并发写入不会破坏数据完整性,不需要完全顺序处理,保持并发性能,只做必要的过滤优化 @funky-eyes wdyt?

 // For HTTP/2, headers must be sent first on the initial response
        if (sendHeaders) {
            Http2Headers headers = new DefaultHttp2Headers().status(nettyStatus.codeAsText());
            headers.set(HttpHeaderNames.CONTENT_TYPE, "text/event-stream; charset=utf-8");
            headers.set(HttpHeaderNames.CACHE_CONTROL, "no-cache");

            ctx.write(new DefaultHttp2HeadersFrame(headers));
        }

        String group = watcher.getGroup();
        Long finalTerm = term != null ? term : GROUP_UPDATE_TERM.getOrDefault(group, 0L);
        String eventData = buildEventData(nettyStatus, closeStream, sendHeaders, group, finalTerm);
        ByteBuf content = Unpooled.copiedBuffer(eventData, StandardCharsets.UTF_8);

        // Send DATA frame (if closeStream is true, it will end the current stream)
        ctx.write(new DefaultHttp2DataFrame(content, closeStream));
        ctx.flush();

Copy link
Contributor

@funky-eyes funky-eyes Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that writing data to the ChannelHandlerContext in a concurrent manner does have this possibility, but it does not affect the data push. It's just that several lines of data are pushed at once. As long as the data sent at one time is complete, it's fine. Because the client parses one line at a time. As for the disorder issue of the term, the client can make its own judgment. Of course, the server can also make some appropriate optimizations.
我认为并发往ChannelHandlerContext里面write数据是存在这种可能性,但是不影响数据推送,无非一下子推送了好几行数据,服务端只要确保一次性发出去的数据是完整的就行,因为客户端是一行行解析的。至于term的乱序问题 客户端可以自己判断,当然服务端也可以适当的做一点优化

这一块并发写入不会破坏数据完整性,不需要完全顺序处理,保持并发性能,只做必要的过滤优化 @funky-eyes wdyt?

 // For HTTP/2, headers must be sent first on the initial response
        if (sendHeaders) {
            Http2Headers headers = new DefaultHttp2Headers().status(nettyStatus.codeAsText());
            headers.set(HttpHeaderNames.CONTENT_TYPE, "text/event-stream; charset=utf-8");
            headers.set(HttpHeaderNames.CACHE_CONTROL, "no-cache");

            ctx.write(new DefaultHttp2HeadersFrame(headers));
        }

        String group = watcher.getGroup();
        Long finalTerm = term != null ? term : GROUP_UPDATE_TERM.getOrDefault(group, 0L);
        String eventData = buildEventData(nettyStatus, closeStream, sendHeaders, group, finalTerm);
        ByteBuf content = Unpooled.copiedBuffer(eventData, StandardCharsets.UTF_8);

        // Send DATA frame (if closeStream is true, it will end the current stream)
        ctx.write(new DefaultHttp2DataFrame(content, closeStream));
        ctx.flush();

The out-of-order issue I mentioned isn't limited to HTTP/2 responses alone. The onChangeEvent method is annotated with @Async, and notifications to watchers are performed using parallelStream. As a result, when the cluster experiences a rapid series of consecutive changes, this could lead to out-of-order notifications (among other potential issues). Of course, addressing this can be treated as a separate optimization task.

eventType = "cluster-update";
}

String json = String.format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that when an HTTP/2 connection is established, the server directly pushes the cluster change results to the client, rather than requiring the client to initiate a separate request to fetch the cluster information.

@funky-eyes funky-eyes modified the milestones: 2.6.0, 2.7.0 Jan 7, 2026
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I don't seem to see any usage of SeataHttpWatch on the client side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module/common common module module/server server module type: feature Category issues or prs related to feature request.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants