Breaking Down the Ether State Problem: A Little-Known Vulnerability to a Serious Threat

This is a little-known vulnerability that has been greatly reduced in threat today.

The Ether Foundation today posted a disclosure of a security vulnerability first discovered in 2019, prior to last month’s Berlin upgrade, that was severe enough to potentially bring down the main network in the event of an attack. The nature of the vulnerability, which triggers random Trie queries, was attempted by Ethernet developers to defend against the vulnerability with EIP-1884, EIP-2583, EIP-2929, and the snapshot feature, and the vulnerability was ultimately less dangerous after the Berlin upgrade.
With this blog post, the goal is to formally disclose a serious threat to the Ether platform. This threat was tangible prior to the Ether Berlin hard fork.

State (State)
Let’s start with a background knowledge of Ether and state.

Ether state consists of a patricia-merkle trie (a kind of prefix tree). Without going into too much detail in this article, the branches on this tree become denser and denser as the state grows. Each account that is added is a leaf. Between the root of the tree and the leaf itself, there are many “intermediate” nodes.

In order to find a given account or “leaf” in this huge tree, it is necessary to parse somewhere between 6-9 hashes from the root to the intermediate nodes, in order to finally resolve the last hash, which will point to the data we are looking for.

In short: for every trie lookup performed to find an account, 8-9 parsing operations are performed. Each parse operation is a database lookup, and each word database lookup can be any number of real disk operations. The number of disk operations is hard to estimate, but since trie keys are cryptographic hashes (conflict resistant), the keys are “random”, which is the worst case scenario for any database.

As ethereum grows, it is necessary to increase the price of gas for operations that access the trie. This was implemented in the October 2016 block height of 2,463,000 in Tangerine Whistle, which included EIP150. EIP150 significantly increased the cost of gas for certain operations after the so-called “Shanghai attack” and made a number of changes to prevent DoS attacks.

Another gas upgrade was also implemented in the Istanbul upgrade, the December 2019 block height of 9,069,000. in this upgrade, EIP 1884 was activated.

EIP 1884 introduced the following operational cost changes.

SLOAD from 200 to 800gas.

BALANCE from 400 to 700gas (SELFBALANCE was reduced), and

EXTCODEHASH from 400 to 700gas.

Questions
In March 2019, Martin Swende performed some measurements on EVM opcode performance. This investigation led to the creation of EIP-1884. A few months before EIP-1884 went live, the Broken Meter paper was officially published (September 2019).

Two Ether security researchers (Hubert Ritzdorf and Matthias Egli) collaborated with one of the paper’s authors, Daniel Perez, to “weaponize” a vulnerability that they submitted to the Ether Bounty program. This was on October 4, 2019.

We recommend that you read that submission in its entirety, as it is a well-written report.

On the channel dedicated to cross-client security, developers from Geth, Parity and Aleth were informed about the commit that day.

The nature of the vulnerability is to trigger random trie queries. A very simple variant is.

Breaking Down the Ether State Problem: A Little-Known Vulnerability to a Serious Threat

In their report, the researchers executed this payload via eth_call on nodes synchronized to the main network, and these are the quantities executed when using 10M gas.

10M gas attack using EXTCODEHASH (400 gas)

Parity: ~90s

Geth: ~70s

10M gas attack using EXTCODESIZE (700gas)

Parity : ~50s

Geth : ~38s

It is clear that the changes introduced by EIP 1884 did have an impact in terms of reducing attacks, but not nearly enough.

This was true before Devcon Osaka. During Devcon, knowledge of the issue was shared among mainnet client developers. We also met with Hubert and Mathias and Greg Markou (ETC staff from Chainsafe). ETC developers also received this report.

As 2019 draws to a close, we know that we are experiencing bigger problems than we previously anticipated, with malicious transactions potentially causing block time intervals to increase to the minute level. To make matters worse, developers are already unhappy with EIP-1884, which interrupts certain contract programs, while users and miners alike are anxious about raising the gas limit.

Furthermore, just two months later in December 2019, Parity Ethereum announced its withdrawal from working on Ether and OpenEthereum took over the maintenance of the code base.

A new client-side coordination channel was then created in which Geth, Nethermind, OpenEthereum and Besu developers continue to coordinate.

The Solution
We realized that we had to take two approaches to solve these problems. One approach is to use the Ether protocol and somehow resolve the issue at the protocol layer. It was better not to violate the contract, better not to punish “good” behavior, but still try to prevent attacks.

The second approach is through software engineering, by changing the data model and structure in the client.

Protocol layer work
The first iteration of how to handle these types of attacks can be viewed here. In February 2020, the solution was officially released as EIP 2583. The idea behind it was to simply add a penalty every time a Trie lookup resulted in a miss (miss).

However, Peter has found a solution to this idea – a “blocking relay” attack – that puts an upper limit on the effective range of this penalty (about 800gas ).

The problem with fines resulting from misses is that a search is first required to determine that a penalty must be imposed. However, if there is not enough gas left to impose a penalty, then an unpaid fee has been enforced. Even if it does result in an exception being thrown, these state reads can be wrapped into nested calls. Allowing the external caller to continue to repeat the attack without paying the (full) penalty.

As a result, this EIP was discarded while we were looking for a better alternative.

Alexey Akhunov explored the concept of Oil, which is a second source of “gas” but essentially different from gas because it is invisible at the execution level and can lead to global reduction of transactions.

Martin presented a similar proposal for Karma in May 2020.

In iterating these plans, Vitalik Buterin suggested increasing only the cost of gas and maintaining the access list. In August 2020, Martin and Vitalik began iterating what became EIP-2929 and EIP-2930.

EIP-2929 effectively solves many of the previous problems.

In contrast to EIP-1884 (which adds costs unconditionally), it adds costs only for content that has not yet been accessed. This results in a net cost increase of less than one percent.

In addition, like EIP-2930, it does not break any contract streams

And, it can be further tuned by increasing the cost of gas (without disrupting operations).

They both go live with the Berlin upgrade on April 15, 2021.

Development work
In October 2019, Peter’s attempt to solve this problem was to perform a dynamic state snapshot.

A snapshot is a secondary data structure for storing Ether state in a flat format that can be built completely online during the live operation of a Geth node.

The benefit of a snapshot is that it acts as an accelerated structure for state access.

Instead of providing O(log N) disk reads (x LevelDB overhead) to access accounts/storage slots, snapshots can provide direct O(1) access time (x LevelDB overhead).

Snapshots support account and storage iterations of O(1) complexity per entry, which allows remote nodes to retrieve sequential state data much more cheaply than before.

The existence of snapshots also enables more exotic use cases, such as offline pruning of state trie or migration to other data formats.

The downside of snapshots is that the original accounts and stored data are effectively duplicated. For the main network, this means using an additional 25GB of SSD space.

The idea of dynamic snapshots has started in mid-2019 with the main aim of becoming a snap sync enabler. At that time, the Geth team was working on a number of “big projects”.

Offline state pruning

Dynamic snapshot + snap sync

LES state distribution via sharded state

However, in the end it was decided to give full priority to snapshots and postpone other projects for a while. These laid the foundation for what would become the snap / 1 synchronization algorithm. It was merged into the main network in March 2020.

With the release of the “dynamic snapshot” feature, we have some breathing room. If the ethernet is attacked, it will be painful, yes, but at least it will be possible to inform users about enabling snapshots. The entire snapshot generation will take a lot of time and it will not be possible to synchronize the snapshots yet, but the network will at least continue to operate.

In March-April 2021, the snap / 1 protocol was introduced in geth, allowing synchronization using the new snapshot-based algorithm. Although still not the default synchronization mode, this is a major improvement for users by making snapshots available not only as attack protection.

In terms of protocols, the Berlin upgrade was officially implemented in April 2021.

Here are some benchmarks developed in our AWS monitoring environment.

Before Berlin upgrade, without snapshot, 25M gas: 14.3s

Before Berlin upgrade, with snapshot, 25M gas: 1.5s

After Berlin upgrade, without snapshot, 25M gas: ~3.1s

After Berlin upgrade, with snapshot, 25M gas: ~0.3s

The (rough) numbers indicate that the Berlin upgrade reduces the efficiency of the attack by a factor of 5 and the snapshot reduces the efficiency of the attack by a factor of 10, for a total reduction in the impact of the attack by a factor of 50.

We estimate that currently on the main network (15M gas), creating blocks without snapshots can take 2.5-3s to execute on a geth node. This number will continue to deteriorate as the state grows (for non-snapshot nodes).

This can be further increased (max) by a factor of 2x if refund is used to increase the effective gas usage within a block. Using EIP 1559, the block gas limit would be more resilient and allow for another 2x increase in temporary bursts (ELASTICITY_MULTIPLIER).

As for the feasibility of implementing such an attack: The cost for an attacker to buy a complete block is about a few ETH (1.5 ETH for 15Mgas at the price of 100Gwei).

Why disclose this threat now?
The threat has been an “open secret” for a long time, and in fact has been inadvertently disclosed publicly at least once, and mentioned several times in ACD conference calls, but without clear details.

Since the Berlin upgrade is now behind us, and by default the geth nodes are using snapshots, we estimate that the threat is low enough to be publicly disclosed, and it is time for a full disclosure of the previous behind-the-scenes developer work.

It is critical that the community has the opportunity to understand the reasons behind changes that negatively impact the user experience, such as increasing gas costs and limiting refund.

This post was written by Martin Holst Swende and Peter Szilagyi on 2021-04-23. Shared with other Ether-based projects on 2021-04-26 and publicly disclosed on 2021-05-18.

Posted by:CoinYuppie,Reprinted with attribution to:https://coinyuppie.com/breaking-down-the-ether-state-problem-a-little-known-vulnerability-to-a-serious-threat/
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.

Like (0)
Donate Buy me a coffee Buy me a coffee
Previous 2021-05-19 13:25
Next 2021-05-19 13:32

Related articles