How to Raise the Gas Limit, Part 2: History Growth

05.07.2024|Storm SlivkoffGeorgios Konstantopoulos

History growth is currently the biggest bottleneck for scaling Ethereum. Somewhat unexpectedly, history growth has become a much larger problem than state growth. Within a couple years, history data will exceed the storage capacity of many Ethereum nodes.

The good news is that:

  1. History growth is an easier problem to solve than state growth.
  2. Solutions are already under active development.
  3. Solving history growth will ease the state growth problem.

In this post we continue our investigation of Ethereum scaling from Part 1, now turning our attention from state growth to history growth. Using high resolution datasets, our goal is to 1) build a technical understanding of Ethereum’s scaling bottlenecks, and 2) help frame the discussion around what Ethereum gas limit is optimal.

This article is part 2 in a blogpost series about Ethereum scaling. Part 1 is about state growth, part 2 is about history growth, part 3 is about state access, and part 4 is about the gas limit.

What is history growth?

History is the set of all blocks and transactions that Ethereum has executed throughout its lifetime. This is the data needed to sync the chain from the Genesis block to the current tip of the chain. History growth is the accumulation of new blocks and new transactions over time.

Figure 1 shows how history growth relates to various protocol metrics and Ethereum node hardware constraints. History growth is limited by a different set of hardware constraints than state growth. History growth puts stress on Network IO, because new blocks and transactions must be transmitted throughout the network. History growth also puts stress on a node’s Storage Space because every Ethereum node stores a complete copy of the history. If history grows quickly enough to exceed these hardware constraints, a node will no longer be able to achieve stable consensus with its peers. Refer to Part 1 of this article series for an overview of state growth and other scaling bottlenecks.

Figure 1. Ethereum Scaling Bottlenecks

Until recently, the majority of each node’s network throughput was used for transmitting history (e.g. new blocks and transactions). This situation has changed with the introduction of blobs in the Dencun hard fork. Blobs now occupy a significant portion of a node’s network activity. However, blobs are not considered part of history because 1) they are only stored by a node for 2 weeks before being discarded and 2) they are not needed for replaying the chain from Genesis. Thanks to (1), blobs do not significantly contribute to the storage burden of each Ethereum node. We will discuss blobs in a later section of this post.

In this article we will focus on history growth and also touch on the relationship between history and state. Since state growth and history growth share some overlapping hardware constraints, they are related problems, and addressing one problem can help address the other.

How fast is history growing?

Figure 2 shows the history growth rate over time since Ethereum’s Genesis. Each vertical bar represents one month of growth. The y-axis represents the number of gigabytes that history grew during that month. Transactions are categorized by their “to address” and sized using their RLP byte representation. Contracts that could not be easily identified are categorized as “Unknown”. The “Other” category includes a long tail of small categories such as infrastructure and gaming.

Figure 2: Ethereum history growth rate over time

Double click the legend to filter

A few key takeaways from this chart:

  • History grows about 6x - 8x faster than state: History growth recently peaked at 36.0 GiB/month and currently sits at 19.3 GiB/month. State growth peaked around 6.0 GiB/month and currently sits at 2.5 GiB/month. A comparison of history vs state, both in growth and cumulative size, can be found later in this post.
  • Until Decun, the history growth rate was rapidly accelerating: While state has been growing roughly linearly for many years (see Part 1), history has been growing superlinearly. Consider that a linearly increasing growth rate leads to a quadratic overall size, and so a superlinearly increasing growth rate leads to a faster-than-quadratic overall size. This acceleration hit an abrupt stop after Dencun. This was the first time Ethereum experienced a large decline in the history growth rate.
  • The majority of recent history growth has come from rollups: Each L2 posts copies of its transactions back to mainnet. This generated a large amount of history and it has led to rollups being the most significant contributor to history over the last year. However, Dencun enabled L2’s to post their transaction data using blobs instead of history, and so rollups no longer generate the majority of Ethereum history. We examine rollups in more detail later on in this post.

What are the biggest contributors to Ethereum history?

The amount of history generated by each contract category reveals how Ethereum usage patterns have evolved over time. Figure 3 shows the relative contributions of various contract categories. This is the same data as Figure 2, normalized to 100%.

Figure 3: Contributions to history growth

Double click the legend to filter

This data reveals four distinct epochs of Ethereum usage patterns:

  1. The Early Era (purple): In Ethereum’s first few years there was little onchain activity. Of these early contracts, most are difficult to identify now, and they are marked Unknown in the chart.
  2. The ERC-20 Era (green): The ERC20 standard was finalized at the end of 2015 but it did not gain significant traction until 2017 and 2018. ERC-20 contracts became the largest history category in 2019.
  3. The DEX / DeFi Era (brown): DEX and DeFi contracts were present onchain as early as 2016, and they began to gain traction in 2017. But it was not until DeFi Summer in 2020 that they became the largest history category. DeFi and DEX contracts peaked at >50% of history growth in parts of 2021 and 2022.
  4. The Rollup Era (grey): At the beginning of 2023, L2 rollups began to consistently execute more transactions than mainnet. This coincided with their contracts generating large amounts of history, and they generated about 2/3 of all Ethereum history in the months before Dencun.

Each era represents a more complex Ethereum usage pattern than the one before it. Complexification over time can be seen as a form of Ethereum scaling that is not captured by simple metrics like transactions per second.

In the most recent month of data, April 2024, rollups are no longer generating the majority of history. It is unclear whether future history will originate from DEX’s and DeFi, or some new pattern of usage will emerge.

What about blobs?

The introduction of blobs in the Dencun hardfork significantly altered history growth dynamics by allowing rollups to post their data using cheap blobs instead of history. Figure 4 zooms into the history growth rate around the date of the Dencun upgrade. The chart is similar to Figure 2, except each vertical bar represents one day instead of one month.

Figure 4: Effect of Dencun on history growth

Double click the legend to filter

A couple key takeaways from this graph:

  • History growth from rollups has fallen by ~2/3 since Dencun: Most rollups have transitioned from call data to blobs, which substantially decreases the amount of history they generate. However, as of April 2024, there remain some rollups that have not switched from call data to blobs.
  • Total history growth has fallen by ~1/3 since Dencun: Dencun only reduced the history growth of rollups. Other contract categories have slightly increased their history growth. Even after Dencun, history growth remains 8x larger than state growth (see next section for details).

Although blobs have reduced history growth, they are still a recent addition to Ethereum. It’s unclear where history growth will stabilize in the presence of blobs.

How much history growth is acceptable?

Raising the gas limit will increase the history growth rate. Proposals to raise the gas limit (e.g. Pump the Gas) must therefore account for the relationship between history growth and each node’s hardware bottlenecks.

To figure out an acceptable rate of history growth, it is helpful to start by examining how long the current status quo can be maintained by modern node hardware for networking and storage. Networking hardware can probably sustain the status quo indefinitely, because the history growth rate is unlikely to return to its pre-Dencun peak until the gas limit is increased. However, the storage burden of history continually increases over time. Under current storage policies it is inevitable that each node’s storage drives eventually become filled by history.

Figure 5 shows Ethereum node’s storage burden over time, and it also projects how this storage burden may grow over the next 3 years. Projections were made using the April 2024 growth rate. It is possible that this rate may rise or fall with future changes to usage patterns or the gas limit.

Figure 5: Size of history, state, and total full node storage burden

A few key takeaways from this figure:

  • History occupies about 3x as much storage space as state. This difference will also increase over time because history is growing about 8x as fast as state.
  • There is a critical threshold around 1.8 TiB where many nodes will be forced to upgrade their storage drives. 2TB is a common storage drive size, which gives only 1.8TiB of usable space. Note that TB (1 trillion bytes) is a different unit than TiB (= 1024 ^ 4 bytes). The “true” critical threshold is even lower for many node operators because post-Merge validators must run a consensus client alongside the execution client.
  • The critical threshold will be reached within 2 or 3 years. Raising the gas limit by any amount will accelerate this timeline proportionately. Reaching this threshold will create a non-trivial maintenance burden for node operators and necessitate the purchase of additional hardware (e.g. a $300 NVME drive).

Unlike state data, history data is append-only and is accessed much less aggressively. Thus it is theoretically possible to store history data separately from state data on cheaper storage media. This can be done with some clients like geth.

Beyond storage capacity, Network IO is the other main hardware constraint on history growth. Unlike storage capacity, network IO limitations will not cause problems for nodes in the short term, but these limitations will become important for future increases to the gas limit.

To know how much history growth can be supported by a typical Ethereum node’s network capacity, it is necessary to characterize the relationship between history growth and various network health metrics such as reorg rate, slot misses, finality misses, attestation misses, sync committee misses, and block submission delays. Analysis of these metrics is beyond the scope of this post, but more information can be found in previous investigations of consensus layer health [1] [2] [3] [4]. Additionally, the Ethereum Foundation’s Xatu project has been building public datasets that should expedite these types of analyses.

How can history growth be solved?

History growth is an easier problem than state growth. It is solved almost entirely by the candidate proposal EIP-4444. This EIP changes each node from preserving the entire Ethereum history to just preserving one year of history. After EIP-4444 is implemented, data storage will no longer be a bottleneck on Ethereum scaling, even in the long term with substantial gas limit increases. EIP-4444 is necessary for the long term sustainability of the network, because otherwise the history will grow fast enough to require regular hardware updates in network nodes.

Figure 6 shows how EIP-4444 affects each node’s storage burden over the next 3 years. This is the same as Figure 4, with the added lighter lines representing storage burdens post-EIP-4444.

Figure 6: Effect of EIP-4444 on Ethereum node storage burden

Some key takeaways from this figure:

  • EIP-4444 will cut the current storage burden in half. The storage burden will drop from 1.2 TiB to 633 GiB.
  • EIP-4444 will stabilize the history storage burden. Assuming a constant rate of history growth, history will be discarded at the same rate that it is generated.
  • After EIP-4444, it will take many years for the post-4444 storage burden to reach the storage burden of today. This is because state growth will be the only factor growing the storage burden, and state grows more slowly than history.

After EIP-4444 has been implemented, history growth will still impose some amount of storage burden because nodes will store a year’s worth of history. However, this burden will not be difficult to address, even as Ethereum reaches global scale. The year-long expiration time of EIP-4444 can likely be reduced to months, weeks, or even shorter once the history preservation approaches are shown to be reliable.

How to preserve Ethereum's history?

EIP-4444 raises the question of how history should be preserved if not by the Ethereum nodes themselves. History plays a central role in the validation, accounting, and analysis of Ethereum, and so it is vital that it be preserved. Luckily, history preservation is an easy problem that requires only 1/n honest data providers. This is in contrast to state consensus problems that require between 1/3 and 2/3 of data participants to be honest. A node operator can validate the authenticity of any history dataset by 1) replaying all of its transactions from Genesis and 2) checking whether those transasctions reproduce the same state root as the current chain tip.

There are multiple approaches for preserving history. Each of these should probably be deployed in parallel to maximize the likelihood of preservation.

  1. Torrents / P2P: Torrents are the simplest and most robust approach. Ethereum nodes can periodically package portions of history and share as public torrent files. For example, a node might create a new history torrent file every 100,000 blocks. Node clients like erigon already perform this process to some extent in an unstandardized way. To standardize this process, all node clients must use the same data format, same parameters, and same P2P networks. Nodes would be able to choose whether or not to participate in this network depending on their storage and bandwidth capabilities. The advantage of torrents is using high-lindy open standards that are already supported by a large ecosystem of data tools.
  2. Portal Network: The Portal Network is a new network specifically designed for hosting Ethereum data. This is a similar approach to torrents, while also providing some extra functionality to make data validation easier. The advantage of the Portal Network is that these extra validation layers provide utilities for light clients to efficiently validate and query the shared datasets.
  3. Cloud hosts: Cloud storage services like AWS’s S3 or Cloudflare’s R2 provide a cheap and high performance option for preserving history. However, this approach carries more legal risk and business operational risk, as it is not guaranteed that these cloud services will always remain willing and able to host cryptocurrency data.

The remaining implementation challenges are more social than technical. The Ethereum community needs to coordinate around specific implementation details so that they can be directly integrated into each node client. In particular, performing a full sync from Genesis (not a snap sync) will then require retrieving the history from history providers instead of Ethereum nodes. These changes do not technically require a hard fork, and so they could be implemented sooner than Ethereum’s next hard fork, Pectra.

All of these history preservation approaches could also be used by L2’s to preserve the blob data they post to mainnet. Compared to history preservation, blob preservation is 1) more difficult due to the total data size being much larger and 2) less important because blobs are not necessary for replaying mainnet history. However, blob preservation is still necessary for each L2 to replay their own history. Thus, some form of blob preservation will be important to the Ethereum ecosystem as a whole. Additionally, if L2’s develop robust blob storage infrastructure, they may also be able to easily store L1 history data.

It is helpful to directly compare the datasets stored by various node configurations before and after EIP-4444. Figure 7 shows the storage burden across Ethereum node types. State Data is accounts and contracts, History Data is blocks and transactions, and Archive Data is a set of optional data indices. The byte counts in this table are based off of a recent reth snapshot, but numbers for other node clients should be roughly comparable.

Figure 7: Storage burden across Ethereum node types

Figure 7: Storage burden across Ethereum node types


To put this into words,

  • An Archive Node stores State data and History data, along with Archive data. Archive nodes are used when someone wants to be able to easily query historical chain states.
  • A Full Node stores only the History data and State data. Most nodes today are full nodes. Full nodes have roughly half the storage burden of Archive nodes.
  • A Full Node After EIP-4444 stores only the State data and the most recent year of History data. This reduces the storage burden from a node from 1.2 TiB to 633 GiB and brings the storage footprint of history data to a steady state value.
  • A Stateless Node, aka a “Light node”, does not store any of these datasets and is able to instantly validate at the chain’s tip. This node type becomes possible once Verkle tries or other state commitment schemes are added to Ethereum.

Finally, there are some additional EIP’s that would limit the history growth rate rather than merely accommodating the current rate. This would be helpful both in the short term for staying within network IO constraints and in the long term for staying within storage constraints. Although EIP-4444 is still necessary for the long term sustainability of the network, these other EIP’s would help Ethereum scale more efficiently in the future:

  • EIP-7623: Reprices call data so that certain transactions with excessive call data are more expensive. Making these usage patterns more expensive will push some of them to convert from call data to blobs. This will decrease the history growth rate.
  • EIP-4488: Imposes a limit on the total amount of call data that can be included in each block. This would place a tighter bound on the rate at which history can grow.

These EIP’s are easier to implement than EIP-4444, so they may be useful as a short-term stopgap until EIP-4444 is ready for production.

Closing

The goal of this article is to develop a data-driven understanding of 1) how history growth works and 2) what can be done to solve it. Much of the data in this article has traditionally been difficult to access, and so we hope that making it available will offer some novel insight into the history growth problem.

History growth has not received enough attention as a bottleneck on Ethereum scaling. Even without gas limit increases, Ethereum’s current conventions for preserving history will force many nodes to upgrade their hardware within a few years. Luckily, this is not a difficult problem to solve. There is already a clear solution in EIP-4444. We believe the implementation of this EIP should be expedited in order to make room for future gas limit increases.

If you are excited about research in Ethereum scaling, reach out to storm@paradigm.xyz and georgios@paradigm.xyz. We’d love to hear about how you are thinking about the problem and potentially collaborate. The data and code used for this article can be found on Github here.

Acknowledgments

Thank you to Thomas Thiery, Tim Beiko, Toni Wahrstaetter, Oliver Nordbjerg, and Roman Krasiuk for review and feedback. Thank you to Achal Srinivasan for the Figure 1 and Figure 7 graphics.

Written by

Biography

Storm is a data associate at Paradigm. He uses data science and data engineering to analyze crypto systems and build crypto data infrastructure. Storm is passionate about open source software and open data standards. He has published open-source tools for collecting and analyzing data from many crypto protocols. Prior to Paradigm, Storm was a data specialist at Fei Labs. He earned his PhD from UC Berkeley doing neuroscience research.

Biography

Georgios Konstantopoulos is the Chief Technology Officer and a General Partner focused on Paradigm’s portfolio companies and research into open-source protocols. Previously, Georgios was an independent consultant and researcher focused on cryptography, information security and mechanism design. He earned his M.Eng. in Electrical & Computer Engineering from Aristotle University of Thessaloniki.

Disclaimer: This post is for general information purposes only. It does not constitute investment advice or a recommendation or solicitation to buy or sell any investment and should not be used in the evaluation of the merits of making any investment decision. It should not be relied upon for accounting, legal or tax advice or investment recommendations. This post reflects the current opinions of the authors and is not made on behalf of Paradigm or its affiliates and does not necessarily reflect the opinions of Paradigm, its affiliates or individuals associated with Paradigm. The opinions reflected herein are subject to change without being updated.

Copyright © 2024 Paradigm Operations LP All rights reserved. “Paradigm” is a trademark, and the triangular mobius symbol is a registered trademark of Paradigm Operations LP