A Verkle tree is a commitment scheme that works similar to a Merkle tree, but has much smaller witnesses. It works by replacing the hashes in a Merkle tree with a vector commit, making broader branching factors more efficient.
Thanks to Kevaundray Wedderburn for his comments on the post.
General description
For details on how verkle trees work, see:
The objective of this post is to explain the specific disposition of the EIP tree verkle project. It is intended for client developers who want to implement verkle trees and are looking for an introduction before delving into the EIP.
Verkle trees introduce a number of changes to the structure of the tree. The most significant changes are:
- a change from 20-byte keys to 32-byte keys (not to be confused with 32-byte addresses, which is a separate change);
- account merger and storage attempts; and finally
- The introduction of the verkle trie itself, which uses vector commits instead of hashes.
As a vector commitment scheme for the verkle tree, we use Pedersen Commitments. Pedersen’s commitments are based on elliptic curves. For an introduction to Pedersen commitments and how to use them as polynomial or vector commitments using inner product arguments, see here.
The curve we are using is bandersnatch. This curve was chosen because it is efficient and also because it will allow efficient SNARKs in BLS12_381 to reason about the verkle tree in the future. This can be useful for rollups, as well as allowing an update where all tokens can be compressed into a SNARK once practical, without the need for an additional commit update.
The curve order/size of the bandersnatch scalar field is p = 13108968793781547619861935127046491459309155893440570251786403306729687672801, which is a 253-bit prime number. As a result of this, we can only safely commit to bit strings that are 252 bits maximum; otherwise, the field overflows. We choose a branching factor (width) of 256 for the verkle tree, which means that each commit can be committed to up to 256 values of 252 bits each (or to be precise, integers up to page-1). We write this as Confirm(v₀, v₁, …, v₂₅₅) commit to list v of length 256.
verkle tree design
One of the design goals with the EIP verkle tree is to make accesses to neighboring locations (for example, storage with nearly the same address or neighboring code snippets) cheap to access. For this, a key consists of a stem of 31 bytes and a suffix one byte for a total of 32 bytes. The key scheme is designed so that “close” storage locations map to the same root and a different suffix. For more information, see the draft EIP.
The verkle tree itself is made up of two types of nodes:
- extension nodesrepresenting 256 values with the same root but different suffixes
- internal nodesThey have up to 256 children, which can be other internal nodes or extension nodes.
A commitment to an extension node is a commitment to a 4-element vector; the remaining positions will be 0. It is:
C₁ and C₂ are two additional commits that commit to all values with stem equal to stem. The reason we need commits is that the values are 32 bytes long, but we can only store 252 bits per field element. So a single commit would not be enough to store 256 values. So instead, C₁ stores the values for the suffix 0 to 127, and C₂ stores 128 to 255, where the values are halved to fit the size of the field (more on that later).
The extension together with the commitments C₁ and C₂ are called an “extension and suffix tree” (EaS for short).
Figure 1 Representation of a walk through a verkle tree for the key 0xfe0002abcd..ff04: the path goes through 3 internal nodes with 256 children each (254, 0, 2), an extension node representing abcd..ff and the two suffix tree commits, including the value of 04, v₄. Note that stem they are actually the first 31 bytes of the key, including the path through the internal nodes.
Commitment to values leaf nodes
Each node of the extensions and suffixes tree contains 256 values. Because a value is 256 bits wide, and we can only safely store 252 bits in a field element, four bits would be lost if we simply tried to store a value in a field element.
To get around this problem, we chose to split the pool of 256 values into two groups of 128 values each. Each 32-byte value in a group is split into two 16-byte values. Then, a Vᵢ∈ 𝔹₃₂ value becomes v⁽ˡᵒʷᵉʳ⁾ᵢ ∈ 𝔹₁₆ and v⁽ᵘᵖᵖᵉʳ⁾ᵢ∈ 𝔹₁₆ such that v⁽ˡᵒʷᵉʳ⁾ᵢ ++ v⁽ᵘᵖᵖᵉʳ⁾ᵢ = vᵢ.
A “sheet marker” is added to the v⁽ˡᵒʷᵉʳ⁾ᵢ, to differentiate between a sheet that has never been accessed and a sheet that has been overwritten with 0s. No value is ever removed from a verkle tree. This is required for upcoming state expiration schemes. That marker is established in Bit 129, that is, v⁽ˡᵒʷᵉʳ ᵐᵒᵈⁱᶠⁱᵉᵈ⁾ᵢ = v⁽ˡᵒʷᵉʳ⁾ᵢ + 2¹²⁸ if vᵢ before, and v⁽ˡᵒʷᵉʳ ᵐᵒᵈⁱᶠⁱᵉᵈ⁾ᵢ = 0 if vᵢ has never been accessed .
The two commitments C₁ and C₂ are then defined as
Compromise of extension nodes
The commit to an extension node is made up of an “extension marker”, which is just the number 1, the two subtree commits C₁ and C₂, and the stem of the key leading to this extension node.
Unlike the extension nodes in the Merkle-Patricia tree, which only contain the section of the key that joins the internal parent node to the internal child node, the root covers the entire key up to that point. This is because verkle trees are designed with stateless testing in mind: if a new key is inserted that “splits” the extension in two, the older sibling does not need to be updated, allowing for a smaller test.
Compromise of internal nodes
Internal nodes have the simplest calculation method for their commits: the node is seen as a vector of 256 values, which are the (field representation of the) root commit of each of its 256 subtrees. The commit for an empty subtree is 0. If the subtree is not empty, the commit for the internal node is
where Cᵢ are the children of the inner node and 0 if a child is empty.
Insertion in the tree
Figure 2 is an illustration of the process of inserting a new value into the tree, which becomes interesting when the stems collide on several leading bytes.
Figure 2 The value v₁₉₂ is inserted at location 0000010000…0000 in a verkle tree containing only the value v₁₂₇ at location 0000000000…0000. Because the stems differ by the third byte, two internal nodes are added up to the different byte. Another “extension and suffix” tree is then inserted, with a full 31-byte trunk. The initial node is not touched and C²₀ has the same value as C⁰₀ before insertion.
Shallower Trees, Smaller Tests
The verkle tree structure makes trees shallower, which reduces the amount of data stored. However, its real power comes from the ability to produce smaller pieces of evidence, i.e. witnesses. This will be explained in the next article.