Skip to content

Conversation

@WangzJi
Copy link
Contributor

@WangzJi WangzJi commented Dec 15, 2025

Ⅰ. Describe what this PR did

Closes: #7646
Add a new seata-benchmark-cli module under test-suite for stress testing Seata transaction modes.

Features:

  • Support for AT, TCC, and SAGA transaction modes
  • Dual execution modes:
    • Empty mode (--branches 0): Measures pure Seata protocol overhead without database operations
    • Real mode (--branches N): Executes actual distributed transactions with database operations
  • Configurable TPS (Transactions Per Second) rate limiting
  • Multi-threaded workload generation with configurable thread pool
  • Fault injection with configurable rollback percentage
  • Window-based progress reporting (every 10 seconds)
  • Performance metrics collection: latency percentiles (P50/P95/P99), success rate, TPS
  • CSV export for post-analysis
  • Warmup support: exclude initial ramp-up period from final statistics
  • YAML configuration file support with priority: CLI args > env var > system property > classpath

Module Structure:

test-suite/seata-benchmark-cli/
├── src/main/java/org/apache/seata/benchmark/
│   ├── BenchmarkApplication.java       # Main entry with picocli CLI
│   ├── BenchmarkConstants.java         # Constants definition
│   ├── config/                         # Configuration classes
│   ├── executor/                       # Transaction executors (AT/TCC/SAGA)
│   ├── model/                          # Data models (metrics, records)
│   ├── monitor/                        # Metrics collection
│   ├── saga/                           # SAGA mode services
│   └── util/                           # Utility classes
└── src/main/resources/
    ├── seata/saga/statelang/           # SAGA state machine definitions
    └── *.conf                          # Seata configuration

Ⅱ. Does this pull request fix one issue?

No, this is a new feature.

Ⅲ. Why don't you add test cases (unit test/integration test)?

This is a benchmark/stress testing tool. The primary validation is manual testing against a running Seata Server. Integration tests would require a full Seata Server setup which is better suited for the existing integration test infrastructure.

Ⅳ. Describe how to verify it

  1. Build the module:

    cd test-suite/seata-benchmark-cli
    ../../mvnw clean package
  2. Start Seata Server (ensure it's running on 127.0.0.1:8091)

  3. Run benchmark (empty mode - no database required):

    java -jar target/seata-benchmark-cli.jar \
      --server 127.0.0.1:8091 \
      --mode AT \
      --tps 100 \
      --duration 60
  4. Run benchmark (real mode - requires Docker):

    java -jar target/seata-benchmark-cli.jar \
      --server 127.0.0.1:8091 \
      --mode AT \
      --tps 100 \
      --duration 60 \
      --branches 3
  5. Verify the output shows progress every 10 seconds and final report with metrics.

Ⅴ. Special notes for reviews

  1. Latency Sampling: To prevent OOM on large-scale tests, latencies are sampled (max 500K samples) - inspired by Kafka ProducerPerformance.

  2. Empty vs Real Mode: Empty mode (--branches 0) is useful for measuring pure Seata Server capacity without database overhead.

  3. SAGA Implementation: Real SAGA mode uses Seata's state machine engine with predefined state machine definitions in src/main/resources/seata/saga/statelang/.

  4. Dependencies: The module uses:

    • picocli for CLI argument parsing
    • snakeyaml for YAML configuration
    • testcontainers for MySQL in real mode
    • Seata's existing infrastructure (TM, RM, Saga engine)

@WangzJi WangzJi changed the title feat: benchmark cli feature: benchmark cli Dec 15, 2025
@WangzJi WangzJi added the type: feature Category issues or prs related to feature request. label Dec 18, 2025
@codecov
Copy link

codecov bot commented Dec 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.57%. Comparing base (340aa3c) to head (ec1ef44).
⚠️ Report is 18 commits behind head on 2.x.

Additional details and impacted files
@@             Coverage Diff              @@
##                2.x    #7865      +/-   ##
============================================
+ Coverage     71.20%   71.57%   +0.37%     
- Complexity      797      871      +74     
============================================
  Files          1300     1294       -6     
  Lines         49620    49554      -66     
  Branches       5874     5884      +10     
============================================
+ Hits          35331    35470     +139     
+ Misses        11371    11166     -205     
  Partials       2918     2918              

see 53 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces a new seata-benchmark-cli module - a command-line tool for stress testing Seata transaction modes (AT, TCC, and SAGA). The tool supports both "empty mode" (protocol overhead testing) and "real mode" (actual transaction execution) to measure Seata's performance characteristics.

Key Changes:

  • CLI-based benchmark tool with picocli framework for argument parsing
  • Support for AT, TCC, and SAGA transaction modes with configurable TPS rate limiting
  • Dual execution modes: empty transactions for pure protocol overhead testing, and real transactions with database/state machine operations
  • Comprehensive metrics collection including latency percentiles (P50/P95/P99), success rates, and TPS measurements with CSV export capability

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
test-suite/seata-benchmark-cli/pom.xml Maven configuration with dependencies for CLI framework, testcontainers, SAGA engine, and benchmark utilities
test-suite/seata-benchmark-cli/src/main/java/org/apache/seata/benchmark/BenchmarkApplication.java Main entry point with CLI argument handling and benchmark orchestration
test-suite/seata-benchmark-cli/src/main/java/org/apache/seata/benchmark/config/* Configuration classes for loading/merging benchmark parameters from CLI, YAML, and environment
test-suite/seata-benchmark-cli/src/main/java/org/apache/seata/benchmark/executor/* Transaction executors for AT, TCC, and SAGA modes with workload generation
test-suite/seata-benchmark-cli/src/main/java/org/apache/seata/benchmark/model/* Data models for metrics collection and transaction records
test-suite/seata-benchmark-cli/src/main/java/org/apache/seata/benchmark/saga/* SAGA mode service implementations with state machine support
test-suite/seata-benchmark-cli/src/main/resources/* Configuration files for Seata client and SAGA state machine definitions
test-suite/seata-benchmark-cli/README.md Comprehensive documentation with usage examples and implementation details
pom.xml Added new benchmark module to test-suite

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@funky-eyes funky-eyes added this to the 2.7.0 milestone Jan 8, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@funky-eyes funky-eyes requested a review from Copilot January 9, 2026 14:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 17 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +59 to +66
totalSamples.incrementAndGet();
synchronized (latencies) {
if (latencies.size() < BenchmarkConstants.MAX_LATENCY_SAMPLES) {
latencies.add(latencyMs);
} else {
// Random replacement strategy to maintain bounded samples
int index = ThreadLocalRandom.current().nextInt(BenchmarkConstants.MAX_LATENCY_SAMPLES);
latencies.set(index, latencyMs);
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the random replacement strategy, using ThreadLocalRandom for sampling may not provide uniform distribution over the entire dataset. Consider using reservoir sampling algorithm for better statistical properties when the sample size is fixed but the total population is unknown or very large.

Suggested change
totalSamples.incrementAndGet();
synchronized (latencies) {
if (latencies.size() < BenchmarkConstants.MAX_LATENCY_SAMPLES) {
latencies.add(latencyMs);
} else {
// Random replacement strategy to maintain bounded samples
int index = ThreadLocalRandom.current().nextInt(BenchmarkConstants.MAX_LATENCY_SAMPLES);
latencies.set(index, latencyMs);
// Position of this sample in the overall stream (1-based)
long n = totalSamples.incrementAndGet();
synchronized (latencies) {
int maxSamples = BenchmarkConstants.MAX_LATENCY_SAMPLES;
if (latencies.size() < maxSamples) {
// Fill the reservoir until it reaches the maximum size
latencies.add(latencyMs);
} else {
// Reservoir sampling:
// With probability maxSamples / n, include this sample
if (ThreadLocalRandom.current().nextLong(n) < maxSamples) {
int index = ThreadLocalRandom.current().nextInt(maxSamples);
latencies.set(index, latencyMs);
}

Copilot uses AI. Check for mistakes.
Comment on lines +310 to +311
// Divide rollback percentage by 3 for each service so total probability is approximately correct
int serviceRollbackPct = rollbackPercentage > 0 ? Math.max(1, rollbackPercentage / 3) : 0;
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The division by 3 for service rollback percentage (line 311) may not accurately reflect the intended overall rollback percentage. If any one of the three services fails, the entire saga will fail. The probability calculation should account for this compound probability. For independent failures, the overall success rate would be approximately (1 - p1) * (1 - p2) * (1 - p3), not a simple division. Consider using 1 - (1 - targetRate)^(1/3) for each service to achieve the target overall rollback rate.

Suggested change
// Divide rollback percentage by 3 for each service so total probability is approximately correct
int serviceRollbackPct = rollbackPercentage > 0 ? Math.max(1, rollbackPercentage / 3) : 0;
// Compute per-service rollback percentage so that the combined rollback rate across 3 services
// matches the target rollbackPercentage (assuming independent failures):
// perServiceRate = 1 - (1 - targetRate)^(1/3)
int serviceRollbackPct;
if (rollbackPercentage > 0) {
double targetRate = rollbackPercentage / 100.0;
double perServiceRate = 1 - Math.pow(1 - targetRate, 1.0 / 3.0);
serviceRollbackPct = Math.max(1, (int) Math.round(perServiceRate * 100));
} else {
serviceRollbackPct = 0;
}

Copilot uses AI. Check for mistakes.
}

public void setMode(String mode) {
this.mode = BranchType.get(mode);
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing validation for the 'mode' parameter conversion. If an invalid mode string is provided via YAML configuration, BranchType.get() may return null or throw an exception. The setMode(String) method should validate the input and throw a meaningful exception for invalid values.

Suggested change
this.mode = BranchType.get(mode);
if (mode == null || mode.trim().isEmpty()) {
throw new IllegalArgumentException("Mode must not be null or empty.");
}
BranchType branchType;
try {
branchType = BranchType.get(mode);
} catch (IllegalArgumentException e) {
throw new IllegalArgumentException("Invalid mode value: " + mode, e);
}
if (branchType == null) {
throw new IllegalArgumentException("Unsupported mode value: " + mode);
}
this.mode = branchType;

Copilot uses AI. Check for mistakes.
Comment on lines +141 to +166
public void waitForCompletion() {
LOGGER.info("Waiting for benchmark completion...");
long startTime = System.currentTimeMillis();
long duration = config.getDuration() * 1000L;
long endTime = startTime + duration;

try {
while (System.currentTimeMillis() < endTime) {
Thread.sleep(1000);
long elapsed = (System.currentTimeMillis() - startTime) / 1000;
if (elapsed % BenchmarkConstants.PROGRESS_REPORT_INTERVAL_SECONDS == 0 && elapsed > 0) {
System.out.printf(
"[%02d:%02d] %d txns, %.1f txns/sec, %.1f%% success%n",
elapsed / 60,
elapsed % 60,
metrics.getTotalCount(),
metrics.getCurrentTps(),
metrics.getSuccessRate());
}
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}

stop();
}
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warmup feature is mentioned in multiple places but the actual implementation for excluding warmup period from metrics is not present. The BenchmarkConfig has warmupDuration field and it's displayed in the output, but there's no code in WorkloadGenerator or BenchmarkMetrics to reset metrics after warmup period. This feature appears to be incomplete.

Copilot uses AI. Check for mistakes.
Latency P95 (ms),45
Latency P99 (ms),89
Latency Max (ms),230
Export Timestamp,2025-12-01T10:30:45Z
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timestamp format in the README example shows '2025-12-01T10:30:45Z' but the actual code in MetricsCollector uses 'yyyy-MM-dd HH:mm:ss' format without timezone information. This documentation inconsistency should be fixed to match the actual implementation.

Suggested change
Export Timestamp,2025-12-01T10:30:45Z
Export Timestamp,2025-12-01 10:30:45

Copilot uses AI. Check for mistakes.
Comment on lines +325 to +327
int nRead;
while ((nRead = is.read(data, 0, data.length)) != -1) {
buffer.write(data, 0, nRead);
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable 'nRead' follows Java naming conventions, but the typical convention in I/O operations is to use 'bytesRead' or 'numRead' for better clarity about what the variable represents.

Suggested change
int nRead;
while ((nRead = is.read(data, 0, data.length)) != -1) {
buffer.write(data, 0, nRead);
int bytesRead;
while ((bytesRead = is.read(data, 0, data.length)) != -1) {
buffer.write(data, 0, bytesRead);

Copilot uses AI. Check for mistakes.
Comment on lines +152 to +155
System.out.printf(
"[%02d:%02d] %d txns, %.1f txns/sec, %.1f%% success%n",
elapsed / 60,
elapsed % 60,
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The progress reporting output format shows elapsed time in MM:SS format but doesn't handle hours. For benchmarks longer than 60 minutes, the display will be incorrect (e.g., 90 minutes would show as "90:30" instead of "01:30:30"). Consider using HH:MM:SS format or documenting the limitation.

Suggested change
System.out.printf(
"[%02d:%02d] %d txns, %.1f txns/sec, %.1f%% success%n",
elapsed / 60,
elapsed % 60,
long hours = elapsed / 3600;
long minutes = (elapsed % 3600) / 60;
long seconds = elapsed % 60;
System.out.printf(
"[%02d:%02d:%02d] %d txns, %.1f txns/sec, %.1f%% success%n",
hours,
minutes,
seconds,

Copilot uses AI. Check for mistakes.
LOGGER.trace("Executing compensation for branch {}", branchId);
}

@SuppressWarnings("lgtm[java/insecure-randomness]")
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suppress annotation with LGTM tag references a deprecated static analysis tool (LGTM.com). Consider using modern alternatives like CodeQL or removing these annotations.

Copilot uses AI. Check for mistakes.
Comment on lines +66 to +80
private void initRealMode() {
// Start MySQL container
startMySQLContainer();

// Create HikariCP connection pool
createDataSource();

// Initialize database schema and data
initDatabase();

// Wrap with Seata DataSourceProxy for AT mode
dataSourceProxy = new DataSourceProxy(rawDataSource);

LOGGER.info("DataSourceProxy initialized, dbType: {}", dataSourceProxy.getDbType());
LOGGER.info("Real AT mode executor initialized with {} accounts", BenchmarkConstants.ACCOUNT_COUNT);
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential resource leak: If an exception occurs during initialization between starting the MySQL container (line 91) and creating the DataSourceProxy (line 77), the MySQL container will not be stopped. Consider wrapping the initialization in try-catch and ensuring proper cleanup, or using try-with-resources pattern where applicable.

Suggested change
private void initRealMode() {
// Start MySQL container
startMySQLContainer();
// Create HikariCP connection pool
createDataSource();
// Initialize database schema and data
initDatabase();
// Wrap with Seata DataSourceProxy for AT mode
dataSourceProxy = new DataSourceProxy(rawDataSource);
LOGGER.info("DataSourceProxy initialized, dbType: {}", dataSourceProxy.getDbType());
LOGGER.info("Real AT mode executor initialized with {} accounts", BenchmarkConstants.ACCOUNT_COUNT);
/**
* Cleanup resources if real mode initialization fails.
*/
private void cleanupResourcesOnFailure() {
if (rawDataSource != null) {
try {
rawDataSource.close();
} catch (Exception e) {
LOGGER.warn("Failed to close HikariCP DataSource during cleanup", e);
} finally {
rawDataSource = null;
}
}
if (mysqlContainer != null) {
try {
mysqlContainer.stop();
} catch (Exception e) {
LOGGER.warn("Failed to stop MySQL container during cleanup", e);
} finally {
mysqlContainer = null;
}
}
}
private void initRealMode() {
try {
// Start MySQL container
startMySQLContainer();
// Create HikariCP connection pool
createDataSource();
// Initialize database schema and data
initDatabase();
// Wrap with Seata DataSourceProxy for AT mode
dataSourceProxy = new DataSourceProxy(rawDataSource);
LOGGER.info("DataSourceProxy initialized, dbType: {}", dataSourceProxy.getDbType());
LOGGER.info("Real AT mode executor initialized with {} accounts", BenchmarkConstants.ACCOUNT_COUNT);
} catch (Exception e) {
cleanupResourcesOnFailure();
throw (e instanceof RuntimeException) ? (RuntimeException) e
: new RuntimeException("Failed to initialize real AT mode executor", e);
}

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +75
// Find the method
Method method = findMethod(service.getClass(), methodName);
if (method == null) {
throw new NoSuchMethodException("Method not found: " + methodName + " in service: " + serviceName);
}

// Prepare input parameters
Object inputParam = prepareInput(input);

LOGGER.debug("Invoking service: {}.{}()", serviceName, methodName);

// Invoke the method
return method.invoke(service, inputParam);
}

private Method findMethod(Class<?> clazz, String methodName) {
for (Method method : clazz.getMethods()) {
if (method.getName().equals(methodName)) {
return method;
}
}
return null;
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method findMethod does not validate parameter types or count. If there are multiple overloaded methods with the same name but different parameters, this will return the first match which may not be the intended method. Consider adding parameter type validation or using more specific method lookup.

Suggested change
// Find the method
Method method = findMethod(service.getClass(), methodName);
if (method == null) {
throw new NoSuchMethodException("Method not found: " + methodName + " in service: " + serviceName);
}
// Prepare input parameters
Object inputParam = prepareInput(input);
LOGGER.debug("Invoking service: {}.{}()", serviceName, methodName);
// Invoke the method
return method.invoke(service, inputParam);
}
private Method findMethod(Class<?> clazz, String methodName) {
for (Method method : clazz.getMethods()) {
if (method.getName().equals(methodName)) {
return method;
}
}
return null;
// Prepare input parameters
Object inputParam = prepareInput(input);
// Determine parameter types based on prepared input
Class<?>[] paramTypes;
if (inputParam == null) {
paramTypes = new Class<?>[] {Object.class};
} else {
paramTypes = new Class<?>[] {inputParam.getClass()};
}
// Find the method with matching name and parameter types
Method method = findMethod(service.getClass(), methodName, paramTypes);
if (method == null) {
throw new NoSuchMethodException("Method not found: " + methodName + " in service: " + serviceName);
}
LOGGER.debug("Invoking service: {}.{}()", serviceName, methodName);
// Invoke the method
return method.invoke(service, inputParam);
}
private Method findMethod(Class<?> clazz, String methodName, Class<?>... paramTypes) {
// First, try to find an exact public method match by name and parameter types
try {
return clazz.getMethod(methodName, paramTypes);
} catch (NoSuchMethodException e) {
// Fallback to manual search for a compatible method (e.g., parameter is an interface/supertype)
for (Method method : clazz.getMethods()) {
if (!method.getName().equals(methodName)) {
continue;
}
Class<?>[] methodParamTypes = method.getParameterTypes();
if (methodParamTypes.length != paramTypes.length) {
continue;
}
boolean compatible = true;
for (int i = 0; i < methodParamTypes.length; i++) {
Class<?> expected = methodParamTypes[i];
Class<?> provided = paramTypes[i];
if (provided != null && !expected.isAssignableFrom(provided)) {
compatible = false;
break;
}
}
if (compatible) {
return method;
}
}
return null;
}

Copilot uses AI. Check for mistakes.
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>8</source>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend using JDK 25, and in the future, providing GraalVM AOT-compiled artifacts along with corresponding benchmarks. This would make it much more convenient for users to run performance tests.

--server 127.0.0.1:8091 \
--mode AT \
--tps 500 \
--threads 50 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that constant TPS and a fixed number of threads should not be used at the same time in performance testing. You should either run tests with a constant target TPS, or control the test by fixing the number of concurrent threads — and then observe the resulting TPS and response time metrics.

// Transfer between two random accounts
long fromAccount = (ThreadLocalRandom.current().nextInt(BenchmarkConstants.ACCOUNT_COUNT) + 1);
long toAccount = (ThreadLocalRandom.current().nextInt(BenchmarkConstants.ACCOUNT_COUNT) + 1);
while (toAccount == fromAccount) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be much simpler to just add or subtract a small random number when the values are equal?


@Override
public final TransactionRecord execute() {
GlobalTransaction tx = GlobalTransactionContext.getCurrentOrCreate();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d recommend using the TransactionalTemplate#execute template method to handle transactions instead. This approach aligns much more closely with how users typically work with Seata in their daily development when using the annotation-based @GlobalTransactional pattern.

* - branches == 0: Mock mode (simplified Saga simulation without state machine)
* - branches > 0: Real mode (state machine engine with compensation support)
*/
public class SagaModeExecutor implements TransactionExecutor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saga mode can only be load-tested on its own and must not be mixed with other transaction modes; we should implement safeguards to prevent users from doing so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark type: feature Category issues or prs related to feature request.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Seata Benchmark 1.0 Development Task

2 participants