Skip to content

fix: queue one-on-one recovery retries for foreground processing [WPB-24176]#4001

Open
Garzas wants to merge 1 commit intodevelopfrom
fix/queue-1on1-recovery-on-foreground
Open

fix: queue one-on-one recovery retries for foreground processing [WPB-24176]#4001
Garzas wants to merge 1 commit intodevelopfrom
fix/queue-1on1-recovery-on-foreground

Conversation

@Garzas
Copy link
Copy Markdown
Contributor

@Garzas Garzas commented Mar 27, 2026

https://wearezeta.atlassian.net/browse/WPB-24176


PR Submission Checklist for internal contributors

  • The PR Title

    • conforms to the style of semantic commits messages¹ supported in Wire's Github Workflow²
    • contains a reference JIRA issue number like SQPIT-764
    • answers the question: If merged, this PR will: ... ³
  • The PR Description

    • is free of optional paragraphs and you have filled the relevant parts to the best of your ability

What's new in this PR?

Issues

  • One-on-one MLS resolution during Slow Sync could leave some users unresolved when recoverable failures happened in the batch.
  • There was no dedicated safe recovery path to retry those skipped one-on-one resolutions later.

Causes (Optional)

  • Batch error handling in one-on-one resolution treated some recoverable failures as terminal for the current run.
  • This could leave specific one-on-one conversations unresolved until another unrelated trigger happened.

Solutions

  • Improved one-on-one batch failure handling for recoverable cases:
    • MissingKeyPackages
    • FederatedBackendFailure.RetryableFailure
    • MLSFailure.MessageRejected(StaleMessage)
  • Added a lightweight queue for pending one-on-one recoveries.
  • Added a foreground recovery use case that retries queued one-on-one resolutions in a fresh transaction.
  • Wired that recovery into foreground actions.

@sonarqubecloud
Copy link
Copy Markdown

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 27, 2026

Test Results

0 tests   - 4 708   0 ✅  - 4 591   0s ⏱️ - 2m 48s
0 suites  -   778   0 💤  -   117 
0 files    -   778   0 ❌ ±    0 

Results for commit a167870. ± Comparison against base commit 5c850d4.

♻️ This comment has been updated with latest results.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 7.93651% with 58 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.55%. Comparing base (c7f8415) to head (a167870).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
...logic/feature/conversation/mls/OneOnOneResolver.kt 10.00% 21 Missing and 6 partials ⚠️
...on/mls/RecoverPendingOneOnOneResolutionsUseCase.kt 0.00% 22 Missing ⚠️
...sation/mls/PendingOneOnOneResolutionsRepository.kt 0.00% 7 Missing ⚠️
...in/com/wire/kalium/logic/feature/user/UserScope.kt 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4001      +/-   ##
===========================================
- Coverage    59.60%   59.55%   -0.05%     
===========================================
  Files         2025     2027       +2     
  Lines        65472    65538      +66     
  Branches      7140     7149       +9     
===========================================
+ Hits         39026    39033       +7     
- Misses       23254    23312      +58     
- Partials      3192     3193       +1     
Files with missing lines Coverage Δ
...wire/kalium/logic/sync/ForegroundActionsUseCase.kt 88.00% <100.00%> (+1.04%) ⬆️
...in/com/wire/kalium/logic/feature/user/UserScope.kt 0.00% <0.00%> (ø)
...sation/mls/PendingOneOnOneResolutionsRepository.kt 0.00% <0.00%> (ø)
...on/mls/RecoverPendingOneOnOneResolutionsUseCase.kt 0.00% <0.00%> (ø)
...logic/feature/conversation/mls/OneOnOneResolver.kt 59.82% <10.00%> (-14.34%) ⬇️

... and 4 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c7f8415...a167870. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown
Contributor

🐰 Bencher Report

Branchfix/queue-1on1-recovery-on-foreground
Testbedubuntu-latest

⚠️ WARNING: No Threshold found!

Without a Threshold, no Alerts will ever be generated.

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds flag.

Click to view all benchmark results
BenchmarkLatencymicroseconds (µs)
com.wire.kalium.benchmarks.logic.CoreLogicBenchmark.createObjectInFiles📈 view plot
⚠️ NO THRESHOLD
905.97 µs
com.wire.kalium.benchmarks.logic.CoreLogicBenchmark.createObjectInMemory📈 view plot
⚠️ NO THRESHOLD
578,309.70 µs
com.wire.kalium.benchmarks.persistence.MessageReadBenchmark.inboxPagingDeepPageBenchmark📈 view plot
⚠️ NO THRESHOLD
132,028.67 µs
com.wire.kalium.benchmarks.persistence.MessageReadBenchmark.inboxPagingFirstPageBenchmark📈 view plot
⚠️ NO THRESHOLD
126,074.32 µs
com.wire.kalium.benchmarks.persistence.MessageReadBenchmark.localMarkAsReadBenchmark📈 view plot
⚠️ NO THRESHOLD
3,330.62 µs
com.wire.kalium.benchmarks.persistence.MessageReadBenchmark.messagePagingDeepPageBenchmark📈 view plot
⚠️ NO THRESHOLD
26,590.52 µs
com.wire.kalium.benchmarks.persistence.MessageReadBenchmark.messagePagingFirstPageBenchmark📈 view plot
⚠️ NO THRESHOLD
11,296.61 µs
com.wire.kalium.benchmarks.persistence.MessagesNoPragmaTuneBenchmark.messageInsertionBenchmark📈 view plot
⚠️ NO THRESHOLD
1,353,539.39 µs
com.wire.kalium.benchmarks.persistence.MessagesNoPragmaTuneBenchmark.queryMessagesBenchmark📈 view plot
⚠️ NO THRESHOLD
21,385.89 µs
🐰 View full continuous benchmarking report in Bencher

@Garzas Garzas requested a review from sbakhtiarov April 8, 2026 10:10
Comment on lines +31 to +44
internal class InMemoryPendingOneOnOneResolutionsRepository : PendingOneOnOneResolutionsRepository {
private val mutex = Mutex()
private val pendingUserIds = linkedSetOf<UserId>()

override suspend fun enqueue(userId: UserId) {
mutex.withLock {
pendingUserIds.add(userId)
}
}

override suspend fun dequeueAll(): Set<UserId> = mutex.withLock {
pendingUserIds.toSet().also { pendingUserIds.clear() }
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means the resolving will be lost to the void in case of process death is it needed to be in memory or we can persist the q in the DB for example?

kaliumLogger.e("Resolving one-on-one failed $failure, skipping")
Either.Right(Unit)
} else {
kaliumLogger.e("Resolving one-on-one failed $failure, retrying")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this retry on purpose?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants