Installing “bombs” on ethereum

The bug to be covered in this article is located within the Geth client’s state downloader, and it can be used to trick the downloader into not syncing correctly with the main network.

This article is the second in a series explaining a bug I found in the go-ethereum (Geth) client.

The bug in this article is located in the Geth client’s state downloader, and it can be used to spoof the downloader so that it does not sync correctly with the main network. An attacker could use this bug to set traps for the ethereum blockchain and arbitrarily trigger a hard fork.

Synchronization
When you want to run an Ethernet node, you must first synchronize on the entire network, i.e., download and calculate all the data needed to build the state of the blockchain at the moment of the latest block. Depending on the user’s own needs, there can be a trade-off between security and speed in the synchronization method, so (at the time of writing) Geth supports two synchronization modes: full synchronization and fast synchronization.

As the name implies, full sync means that the entire sync process to the ethereum blockchain is performed independently. This means that your Geth node downloads and verifies the proof-of-work (PoW) for each block, in addition to counting every transaction within the block; thus, the node can generate the latest state of the blockchain locally without having to trust other nodes. This model is more secure, but at the expense of speed; full synchronization of Geth can take anywhere from a few days to a few weeks.

However, some users may not want to wait for weeks. Maybe they are pressed for time, or maybe they don’t feel that the sacrifice is worth it. Therefore, Geth offers a model where all chain data before a recent block (called a “pivot block”) is synchronized using a faster method, and only the block chains after the pivot block use a slower, fully synchronized algorithm. In fast-sync mode, Geth downloads blocks, but only randomly selects blocks to verify the proof of workload instead of verifying every block; also, instead of performing transactions itself, it downloads the state tree directly from other nodes in the network to get the final blockchain state.

Of course, Geth will not blindly trust the state tree data sent back by other nodes either, as a malicious node could also claim that an account has only a little bit of money (but actually has a lot). To understand how Geth can tell if the data it receives is correct or not, we first need to understand the Merkle-Patricia trie.

Merkle-Patricia tree
The Merkle-Patricia tree (MPT) is a key data structure in the Geth client that is a combination of both the Merkle tree and the Patricia tree.

In short, a Patricia tree stores data in a tree structure based on the prefix of the data. Compared to other techniques (e.g., hash tables, hashmaps), Patricia trees are inherently well suited for storing similar data, albeit at the expense of speed. Here’s how multiple words starting with r can be stored in a Patricia tree.

Installing "bombs" on ethereum
  • Image source: https://commons.wikimedia.org/w/index.php?curid=2118795
    Next, let’s talk about Merkle trees. In a Merkle tree, each leaf node is a hash of the data, and each non-leaf node is a hash of its two children. If a user knows the Merkle root (i.e., the top hash) of a Merkle tree and wants to confirm whether some data is stored in the tree, he only needs to use one path in the tree, and the number of nodes involved in this path is only proportional to the logarithm of the number of leaf nodes (note that it is not the number of leaf nodes). As shown in the figure below, suppose a user wants to prove that L2 is stored in this tree, he only needs to provide Hash 0-0 and Hash 1. Then, the verifier generates Hash 0-1, Hash 0 and Top Hash, and compares Top Hash with its expected Merkle root.
Installing "bombs" on ethereum
  • Figure source: https://commons.wikimedia.org/w/index.php?curid=18157888
    Merkle Patricia trees combine the prefix-based storage structure of Patricia trees with Merkle trees to create a new data structure that not only supports cryptographic authentication methods, but also maintains good runtime performance.

In a Merkle Patricia tree, keys and values are arbitrary strings of bytes. To get the value of a key, we first have to convert the key into a sequence of hexadecimal characters, i.e., turn each byte into two hexadecimal characters. Then, we have to first query the root node what the next node is based on the first character in the sequence; after getting this child node, then query the node down based on the second character, and so on, until we find the last node and get the last character.

In this example below, we can see that the Merkle Patricia tree actually contains three different kinds of nodes. The extension node (also known as the short node in the Geth codebase) is optimized and is responsible for storing a sequence of characters. In the case shown below, the root node stores all the keys starting with a7 without using two different nodes to represent a and 7. The branch nodes (or “full nodes”) contain a pointer to each possible character and an extra null to store the value of the current node. Finally, the key-end field of a leaf node (also known as a value node) necessarily matches the suffix of the key it stores1 .

Installing "bombs" on ethereum

State Trees
Now that we know how the Merkle Patricia tree works, we can start exploring what the global state tree is.

The vast majority of the blockchain’s data is stored in the global state tree. While the idea of including the state tree as a unique entity within each block may seem convenient, it is actually extremely inefficient to replicate the full state tree for each block, since the state tree differs only slightly from block to block. geth uses a different scheme. You can imagine that Geth maintains a pool of MPT nodes. The state tree of each block is just a subset of the entire pool. Whenever a new block is mined or imported, a new MPT node is added to the pool.

To identify the root node in the node pool, we must query the block header. Each block contains a field pointing to stateRoot, which points to the root node of the MPT (and that MPT stores account data keyed by account address 2). In this way, Geth can use the algorithm we described above to query for account information such as nonce or balance for any address.

Installing "bombs" on ethereum

Note that in the case of a contract, the account state will contain a non-empty storageRoot field and a codeHash field. storageRoot field points to another root node. However, the purpose of the MPT in question is to store the contract’s storage item data; the MPT will use the storage slot as the key and the raw data as the value.

Installing "bombs" on ethereum

To store MPT on disk, Geth chose to use LevelDB as the database. However, LevelDB is a key-value database that only supports string-to-string mappings, and MPT is not a string-to-string mapping. To solve this problem, Geth writes each node as a key-value pair, thus flattening the global state tree: the hash of the node as the key and the serialized node as the value. In this way, Geth can query the state tree of any block because the stateRoot field in the block header is the key that can be used to find the serialized MPT node in LevelDB (translation: i.e., the root node of an MPT).

Tree confusion
So, suppose you start a Geth node and connect to the network using fast sync mode. geth will download all the block data fast, but not perform any transactions. Soon after, you will get a blockchain with no state information. At this point, Geth starts synchronizing from the stateRoot of the pivot block via the state downloader.

// NewSync creates a new trie data download scheduler.func NewSync(root common.Hash, database ethdb.KeyValueReader, callback LeafCallback, bloom *SyncBloom) *Sync { ts := &Sync{ database: database, membatch: newSyncMemBatch(), requests: make(map[common. Hash]*request), queue: prque.New(nil), bloom: bloom, } ts.AddSubTrie(root, 0, common.Hash{}, callback) return ts}

  • Source: https://github.com/ethereum/go-ethereum/blob/87c463c47aa6d10a55a4b860bab3d53199814d01/trie/sync.go#L246-L255
    The state downloader requests the MPT node data corresponding to the MPT node key from the peer node. After receiving the result, the state downloader performs a hash calculation on the node data and verifies that the resulting hash is the same as the node key. If it is, the state downloader knows that the MPT node is correct, and then sends more requests for each child node of the MPT node. // Create and schedule a request for all the children nodes requests, err := s.children(request, node) if err ! = nil { return committed, i, err } if len(requests) == 0 && request.deps == 0 { s.commit(request) committed = true continue } request.deps += len(requests) for _, child := range requests { s.schedule(child) }
  • Source: https://github.com/ethereum/go-ethereum/blob/87c463c47aa6d10a55a4b860bab3d53199814d01/trie/sync.go#L205-L218
    If a leaf node (containing the serialized account state) is encountered, the account’s code and state tree will be queued to be fetched.

callback:=func(leaf[]byte, parentcommon.Hash) error{varobjAccountiferr:=rlp.Decode(bytes.NewReader(leaf), &obj); err!=nil{ returnerr}syncer.AddSubTrie(obj.Root, 64, parent, nil)syncer.AddRawEntry(common.BytesToHash(obj.CodeHash), 64, parent)returnnil}

// If the item was not requested, bail outrequest:=s.requests[item.Hash]ifrequest==nil{returncommitted, i, ErrNotRequested} ifrequest.data!=nil{returncommitted, i, ErrAlreadyProcessed}// If the item is a raw entry request, commit directlyifrequest.raw{ request.data=item.Datas.commit(request)committed=truecontinue}

func(sSync) schedule(reqrequest) {// If we’re already requesting this node, add a new reference and stopifold, ok:=s.requests[ req.hash]; ok{old.paints=append(old.paints, req.paints…) return}// Schedule the request for future retrievals.queue.Push(req.hash, int64(req.depth))s.requests[req.hash] =req}

Remember that the original entry request does not process the node in order to synchronize the child node (unlike a subtree request). This means that if we can somehow raise a conflict between the original entry and the subtree node, we can get Geth to synchronize an incomplete state tree. Also, Geth will not exit (report an error) when it encounters a tree node that does not exist locally, which is tantamount to assuming that this (local missing state data) situation will not happen if the downloader reports a successful synchronization. This means that a Geth node that is missing a tree node behaves very differently from other nodes that are fully synchronized with the tree.

So, how do we trigger a conflict? It turns out to be very simple: we simply deploy another contract with the serialized state root of one of the contracts we control, so the hash of that contract code must be the same as the state root hash (of the contract we control), which means that if the contract code is synchronized first, the state tree of the other contract will not be downloaded in its entirety.

To sum up
If someone were to exploit this vulnerability, they would have to complete multiple steps and wait a long time.

First, we deploy a contract (we would use the vulnerability to overwrite the state tree of this contract). This contract should have a unique stateRoot, so that MPT nodes associated with this stateRoot are not synchronized earlier.

contract Discrepancy { uint public magic = 1; // make sure we have a unique stateRoot uint[] private filler = [1,2,3,4,5,6,7,8,9,0 ];}
Next, we deploy a second contract in order to trigger a hardfork using a contradiction.

contract Hardfork { function hardfork(Discrepancy target) public { require(target.magic() == 0, “no discrepancy found”); selfdestruct(msg.sender); }}
Finally, we deploy a third contract using the state root of the Discrepancy contract as code. One thing we must note here is that the address of the third contract is “smaller” than the address of the Discrepancy contract, thus ensuring that it is always synchronized before the Discrepancy contract.

contract Exploit { constructor(bytes memory stateRoot) public { assembly { return(add(stateRoot, 0x20), mload(stateRoot)) } }}
Now we’re done. When new Geth nodes join the network using fast synchronization, they first request the Exploit contract to synchronize their state subtree and code. When the code for the Exploit contract is synchronized, it creates a request for the original entry that looks identical to Discrepancy’s state root request, but it is not processed as a subtree request. This means that the node will never download Discrepancy’s state trie, so future requests to read magic will return 0 instead of 1.

After a long enough time, all we have to do is call Hardfork.hardfork(discrepancy). Each node that properly synchronizes the entire network will see a rollback transaction, and each Geth node that joins the network using fast synchronization will see a successful transaction. This causes a block to produce two different state roots, meaning that we can trigger chain splits at will.

The Geth team quickly resolved the attack by addressing the tree read error in PR #21039, and then completely fixed the vulnerability by distinguishing between the code part and the tree part in PR #21080.

Conclusion
This is a very interesting vulnerability that allows an attacker to set up a “bomb” on the ethernet and detonate it at any time, causing all Geth nodes using fast sync to fork from the main network. This trap exploits the extremely complex logic in Geth’s synchronization and data storage code, which is probably why it has gone unnoticed for a long time.

Stay tuned for the third and final article in this series. In this post, we will explore the latest bug in the Geth client, the details of which cannot be disclosed.

Footnotes
Technically, the value node in Geth does not contain a suffix. You can think of it as an extension node followed by a value node (containing only raw data).

In fact, Geth uses a “secure trie”, i.e., a hash of all keys by the SHA-3 algorithm, thus ensuring that all keys are of fixed length.

(End)

Posted by:CoinYuppie,Reprinted with attribution to:https://coinyuppie.com/installing-bombs-on-ethereum/
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.

Like (0)
Donate Buy me a coffee Buy me a coffee
Previous 2021-06-09 08:59
Next 2021-06-09 09:05

Related articles