One of the problems with ethereum, or any blockchain, is that it grows in size over time. This means an increase in the complexity of your code and your storage requirements.
A blockchain must preserve all data throughout its history, which must be stored by all clients and downloaded by new clients. This leads to a constant increase in client load and synchronization time.
Additionally, code complexity increases over time because it is “easier to add a new feature than to remove an old one,” Vitalik Buterin wrote in his eth.limo/general/2024/10/26/futures5.html”>blog.
Therefore, Buterin believes that developers must actively work to stop these growing trends while preserving the permanence of ethereum. Therefore, Buterin has presented The Purge, a plan with three parts that aim to simplify the blockchain and reduce its data load.
Part 1: History Expiration
A fully synchronized ethereum node currently requires around 1.1 TB of storage space for the execution client. Requires a few hundred more gigabytes for the consensus client. According to Buterin, most of this data is historical, such as data on historical blocks, transactions, and receipts, many of which are several years old. To store all this history, the disk space required continues to increase by hundreds of gigabytes each year.
Buterin believes the problem can be solved by something called History Expiration.
Each block in a blockchain points to the previous one through a hash link. This means that the consensus on the current block indicates a consensus on history.
According to Buterin, as long as the network has consensus on the current block, any related historical data can be provided by a single actor through a Merkle proof, which allows anyone to verify its integrity. This means that instead of each node storing all the data, each node could store a small percentage of the data, reducing storage requirements.
Buterin basically suggests adopting the operating model of torrent networks, where each participant stores and distributes only a small part of the data stored and distributed by the network.
ethereum has already taken steps to reduce storage requirements: some information now has an expiration date. For example, consensus blocks are stored for six months and blobs for 18 days.
ethereum.org/EIPS/eip-4444″>EIP-4444 is another step in that direction: it aims to limit the storage period of historical blocks and receipts to one year. However, the long-term goal is to have a fixed period, such as 18 days, during which each node has to store everything and then the oldest data is stored in a distributed way on a peer-to-peer network.
Part 2: Expiry of the State
According to Buterin, eliminating the need for customers to store all history does not completely solve the problem of excessive storage requirements. This is because a customer has to increase their storage capacity by around 50GB each year due to “continuous state growth: account balances and nonces, contract code, and contract storage.”
A new state object can be created in three ways: creating a new account, sending eth to a new account, and setting up a previously idle storage slot. Once a state object is created, it remains in that state forever.
Buterin believes that the solution to automatically expire state objects over time should be efficient, easy to use, and developer-friendly. This means that the solution should not require large amounts of computation, that users should not lose access to their tokens if they leave them untouched for years, and that developers will not suffer major inconvenience in the process.
Buterin suggests two types of “least known bad solutions”:
- State partial expiration solutions
- State expiration proposals based on address periods.
Partial expiration of the state
Proposals for partial state expiration work on the principle of dividing the state into “fragments.” This would require everyone to store the “top level map” of which fragments are empty or non-empty forever. Data within fragments is only stored if it has been accessed recently. The “resurrection” mechanism allows anyone to recover the data in a chunk if it is not stored, providing proof of what the data was.
Status expiration based on address period
Address period-based state expiration proposes having a growing list of state trees instead of just one that stores the entire state. Any state that is read or written is updated in the most recent state tree. A new empty state tree is added once per period, which could be a year.
In this scenario, the oldest state trees are frozen and full nodes need to store only the two most recent trees. If a state object becomes part of an expired tree, it can be read or written, but the transaction would require a Merkle test to do so. After the transaction, it will be added back to the last tree.
Feature Cleanup
Over time, all protocols become complex, no matter how simple they may be at the beginning.
Buterin wrote:
“If we don't want ethereum to enter a black hole of increasing complexity, we must do one of two things: (i) stop making changes and ossify the protocol(ii) be able to really eliminate features and reduce complexity.”
According to Buterin, cleaning up ethereum's complexity requires several small fixes, such as removing the SELFDESTRUCT opcode, removing old transaction types and beacon chain committees, overhauling LOG, and more. Buterin also suggested simplifying gas mechanics, eliminating gas observability, and improving static analysis.