- What is a Node in Merk?
- Node Structure and Components
- What Can Be Proved?
- What Cannot Be Proved?
- Proof Node Types
- How Proofs Work
- Examples
In Merk (Merkle AVL tree), every node is both:
- A data container: Stores a key-value pair
- A tree structure element: Has left and right children (or none)
- An authenticated element: Contains cryptographic hashes
Unlike traditional Merkle trees where only leaves store data, in Merk every node stores data, making it more space-efficient.
pub struct TreeNode {
pub inner: Box<TreeNodeInner>,
pub old_value: Option<Vec<u8>>, // Previous value for cost tracking
pub known_storage_cost: Option<u32>, // Cached storage cost
}
pub struct TreeNodeInner {
pub left: Option<Link>, // Left child
pub right: Option<Link>, // Right child
pub kv: KV, // Key-value data
}
pub struct KV {
pub key: Vec<u8>, // The node's key
pub value: Vec<u8>, // The node's value
pub hash: Hash, // Hash of entire node
pub value_hash: ValueHash, // Hash of just the value
pub feature_type: TreeFeatureType, // Node type (basic, sum, etc.)
}Each node computes its hash as:
value_hash = Hash(value)
kv_hash = Hash(varint(key.len()) || key || value_hash)
node_hash = Hash(kv_hash || left_child_hash || right_child_hash)
This creates a chain of authentication from leaves to root.
Merk can generate cryptographic proofs for:
Prove that a key-value pair exists in the tree:
- "Key X has value Y"
- "Key X exists" (without revealing the value)
- "Keys in range [A, B] exist with these values"
Prove that a key does NOT exist in the tree:
- "Key X is not in the tree"
- "No keys exist in range [A, B]"
Prove all keys within a range:
- "All keys between 'alice' and 'bob' are: [alice, amanda, bob]"
- "The first 10 keys starting from 'X' are: [...]"
- Sum Trees: "The sum of all values is 1000"
- Count Trees: "There are exactly 50 elements"
- Combined: "50 elements with total sum 1000"
Prove no keys exist in a range by showing the neighboring keys:
- "No keys exist between 'cat' and 'dog' (here are the adjacent keys)"
Prove the tree has a specific root hash:
- "The tree with root hash H contains key K with value V"
Merk has limitations on what can be efficiently proved:
- Cannot prove what a value WAS (only current state)
- Cannot prove when a value changed
- Cannot prove the history of modifications
- Cannot efficiently prove "all keys with value > 100" unless indexed
- Cannot prove "all keys matching pattern X" without traversing
- Cannot prove aggregations on non-indexed attributes
- Cannot prove "no key has value X" without full tree traversal
- Cannot prove "all values are unique"
- Cannot prove when a key was inserted
- Cannot prove who inserted a key
- Cannot prove access patterns
- Cannot prove "key X has the 5th largest value"
- Cannot prove "key X appears before key Y" without range context
When generating proofs, Merk uses different node representations to optimize proof size:
Node::Hash(hash: [u8; 32])- Purpose: Proves a subtree exists without revealing contents
- Size: 32 bytes
- Use Case: When you need to verify tree structure but not the data
- Example: Proving a path to a specific key without revealing sibling data
Node::KVHash(kv_hash: [u8; 32])- Purpose: Proves the hash of a key-value pair without revealing the actual data
- Size: 32 bytes
- Use Case: When you need to verify data exists but keep it private
- Example: Proving a node exists in a path without exposing its contents
Node::KV(key: Vec<u8>, value: Vec<u8>)- Purpose: Full disclosure of both key and value
- Size: Variable (key length + value length)
- Use Case: When the verifier needs to see the actual data
- Example: Proving "alice" has balance "100"
Node::KVValueHash(key: Vec<u8>, value: Vec<u8>, value_hash: [u8; 32])- Purpose: Reveals key and value plus a separate hash of just the value
- Size: Variable + 32 bytes
- Use Case: When you need to prove the value and enable value-specific operations
- Example: Proving data in a sum tree where value hash is used for aggregation
Node::KVDigest(key: Vec<u8>, value_hash: [u8; 32])- Purpose: Reveals the key but only provides hash of the value
- Size: Key length + 32 bytes
- Use Case: Proving a key exists without revealing its value
- Example: Proving "alice" exists without showing her balance
Node::KVRefValueHash(key: Vec<u8>, value_hash: [u8; 32], referenced_value: Vec<u8>)- Purpose: For reference elements - shows key, value hash, and the referenced data
- Size: Key length + 32 bytes + referenced value length
- Use Case: Proving references in GroveDB where value points to other data
- Example: Proving an index entry that references actual data elsewhere
In a count tree, it's important to understand the difference between what's stored in the TreeFeatureType and what's computed:
TreeFeatureType Storage:
CountedMerkNode(1)- Regular items store just their own contribution of 1CountedMerkNode(n)- CountTree elements store their specific count value
Aggregate Computation:
The total count is computed dynamically and stored in the Link's aggregate_data field:
// In TreeFeatureType - stores only own contribution
CountedMerkNode(1) // Just this node's count
// In Link - stores computed aggregate
aggregate_data: AggregateData::Count(3) // This node + all descendantsExample count tree structure:
root
TreeFeatureType: CountedMerkNode(1)
Link.aggregate_data: Count(7)
/ \
alice charlie
CountedMerkNode(1) CountedMerkNode(1)
aggregate_data: Count(3) aggregate_data: Count(3)
/ \ \
bob carol dave
CountedMerkNode(1) CountedMerkNode(1)
aggregate_data: Count(1) aggregate_data: Count(1)
The aggregation works as follows:
- Each node's
TreeFeatureTypestores only its own count (usually 1) - The
aggregate_datain the Link stores: own count + left subtree aggregate + right subtree aggregate - This aggregate data is persisted to disk but is NOT part of the authenticated state
- This allows O(1) retrieval of the total count at any node level without recomputation
Important: Aggregate Data is NOT in the State
While aggregate data is persisted to disk for performance, it is NOT part of the cryptographic state:
- The node hash is computed from:
Hash(kv_hash, left_child_hash, right_child_hash) - Aggregate data is NOT included in the hash computation
- Therefore, aggregate data cannot be proven with a GroveDB proof
- It's a derived value that can be recomputed from the tree structure
Storage Layout: When a Link is persisted, it includes:
- Key (with length prefix)
- Hash (32 bytes) - computed WITHOUT aggregate data
- Child heights (2 bytes)
- Aggregate data type (1 byte) + value(s) - cached but not authenticated
This design separates:
- Authenticated State: The actual tree structure and values (provable)
- Cached Derivatives: Aggregate counts/sums for performance (not provable)
The precomputed storage strategy trades a small amount of extra storage space for massive query performance improvements, while keeping the authenticated state minimal.
- Path Selection: Identify the path from root to target key(s)
- Node Selection: Choose minimal set of nodes needed for verification
- Node Type Selection: Pick the most efficient node representation
- Encoding: Serialize nodes with operation instructions
- Decode: Parse the proof into nodes and operations
- Execute: Run the stack-based virtual machine:
Push: Add node to stackParent: Combine top two nodes as parent-childChild: Make top node a child of the next
- Hash Verification: Recompute hashes and verify they match
- Root Validation: Ensure final hash matches expected root
The proof uses a stack machine with operations:
- Push/PushInverted: Add nodes to the verification stack
- Parent/ParentInverted: Build parent-child relationships
- Child/ChildInverted: Attach children to parents
Proving key "alice" has value "100" in this tree:
root
/ \
alice charlie
/
bob
Proof contains:
KV("alice", "100")- The target nodeHash(charlie_subtree_hash)- Sister subtree as hash only- Operations to reconstruct:
Push(alice), Push(charlie_hash), Parent
Proving all keys from "a" to "c":
Proof nodes:
1. KV("alice", "100")
2. KV("bob", "200")
3. KV("charlie", "300")
4. Hash(left_boundary) // Proves nothing before "alice"
5. Hash(right_boundary) // Proves nothing after "charlie"
Proving "barbara" doesn't exist (between "alice" and "charlie"):
Proof contains:
1. KV("alice", "100") // Left neighbor
2. KV("charlie", "300") // Right neighbor
3. Proof that alice's right child leads to charlie
4. No "barbara" in between
Proving sum of all values is 600:
Proof contains:
1. KVValueHash("alice", "100", hash_of_100)
2. KVValueHash("bob", "200", hash_of_200)
3. KVValueHash("charlie", "300", hash_of_300)
4. Sum aggregation data showing 100+200+300=600
- Use most compact node type that satisfies requirements
- Batch related proofs to share common nodes
- Consider proof size vs. verification cost trade-offs
- Always verify against known root hash
- Validate all hash computations
- Check for malformed proofs (wrong operation sequences)
- Verify aggregate values match individual components
- Use
Hashnodes to hide irrelevant data - Use
KVDigestto prove existence without revealing values - Structure trees to minimize data exposure in proofs
Merk's node structure and proof system provide a flexible, efficient way to prove statements about tree contents. By understanding the different node types and their purposes, developers can generate optimal proofs that balance size, privacy, and verification requirements. The inability to prove certain properties (like historical state) is a fundamental limitation of Merkle trees, but GroveDB's hierarchical structure helps overcome many limitations through careful tree organization and indexing.