Skip to content

Merkle tree recovery#767

Merged
mafintosh merged 37 commits intomainfrom
merkle-tree-recovery
Feb 12, 2026
Merged

Merkle tree recovery#767
mafintosh merged 37 commits intomainfrom
merkle-tree-recovery

Conversation

@lejeunerenard
Copy link
Contributor

First step is to allow missing roots when readying a hypercore. The roots were used to determine the length based on their span. If the roots are not available the header's tree length value is used.

The general technique to recover a merkle tree node, is to pass a fully remote proof from a peer with the node to the affected peer.

  1. A fully remote proof can be generated targeting the node (via it's merkle index which is not equivalent but related to it's block index) by calling:
    const proof = await core.generateRemoteProofForTreeNode(treeNodeIndex)
  2. The proof can then be verified and applied on the peer missing the node like so:
    await core.recoverFromRemoteProof(proof)

Since proofs include the nodes as part of the proof, the peer without the node can directly write the nodes after verifying them. A new upgrade argument was added to fully-remote-proof.js's proof() to support overriding the default upgrade for the sender's length. This allows the proof generated for recovery to target subtree roots.

Tests were added to demonstrate this flow for both root nodes and subtree root nodes. A new assertion was added for the fully-remote-proof.js tests as well for the added upgrade argument.

The length of the core is computed via the span of the merkle roots but
is also loaded via the header's tree length.
Picks the rightSpan as the index to get an upgrade for to target the
index.
This allows the generated proof to target a merkle tree node even if
it's a sub-root of the current roots.
If the requesting core receives a data message with the upgrade proof
and when checking for a conflict errors when checking the local proof,
then attempt to apply the upgrade nodes from the remote locally and
recheck. This allows cores to automatically repair when detecting a
conflict in cases where the tree nodes are missing.
This prevents tree nodes from being modified during other merkle tree
modification. Also ensures that the checks and modifications are atomic
so that at the time of repair the tree used to verify the proof will be
the tree modified.
Helpful for logging the failure case when a proof is out of date. Is the
inverse of the `repaired` event. Finally helps deflake the tests for
merkle tree recovery which can fail when just waiting on `repairing`
event as the tree nodes expected to be applied (because the proof passed
by a peer was valid) are not applied yet when checking the tree node to
be applied.
This makes it clearer that the check for the node is after either
success or failure. Especially helpful for showing that the test fails
if you remove the state mutex locks in `_repairTreeNodes()`.
This mode makes the test for truncating race condition moot as it will
throw when trying to truncate now. Appends are also protected.
@lejeunerenard
Copy link
Contributor Author

In addition to manual fully remote proofs, merkle tree roots can be repaired automatically when a core is opened with missing root nodes and core.recoverTreeNodeFromPeers() is called when replicating. This requests an upgrade from all peers which will attempt to repair with the proof response.

Repair Mode

A core opened with no roots but a header tree length and overwrite is not enabled will enabled repair mode. While in repair mode no appending nor truncating is supported. Once the core is repaired it will need to be closed and reopened to use normally.

The repair lifecycle can be tracked via the following added events:

  • repairing
    A proof was received and is being verified before applying.
  • repair-failed
    Repairing by applying a proof failed. This can be caused by one of two reasons: the proof was invalid or after applying the proof's tree nodes the core was still not valid.
  • repaired
    The core was successfully repaired via a remote proof.

Test cases for showcasing repairing via remote peer and preventing appending and truncating while in repair mode were added.

Also do not send `sync` message as a repairing core is not a valid
source for an upgrade.

Set pushOnly mode as soon as repair mode is enabled.
Guards prevent requests from peers sending messages that could change
the merkle tree mid update.

Core will also require reopening to disable the repair mode.
if (tx === undefined) throw INVALID_OPERATION('No database batch was passed')

if (this.session.core._repairMode) {
throw Error('Cannot commit while repair mode is on')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be nice to use a typed error from hypercore-errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used ASSERTION and changed the assert() in lib/session-state.js to use the same for the same error. These will ensure its thrown as uncaughts by safety-catch.

@mafintosh mafintosh merged commit 1e2a027 into main Feb 12, 2026
5 checks passed
@mafintosh mafintosh deleted the merkle-tree-recovery branch February 12, 2026 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants