Parse Celestia and DA

What is Data Availability

As we all know, one of the characteristics of blockchain technology is that the data stored on the chain is safe and reliable, and cannot be tampered with. What does data availability mean? Can’t the consensus of the blockchain guarantee the security of data? Obviously not, the security of blockchain data is recognized by everyone, and it is also one of the driving forces for the continuous development of blockchain. So what is the DA (data availability) layer, let’s take a look at the following situations.

If a node wants to verify a transaction or a block, the node needs to download all blocks and transaction data. Due to the continuous operation of the blockchain, the block and transaction data will continue to grow, and the cost of this node will also increase. So that more and more nodes (especially individual users) can only choose to run light nodes. These light nodes, without downloading all transaction data, cannot verify transactions and blocks, and can only trust the consensus nodes (full nodes) they choose. Therefore, in fact, these light nodes do not know whether the obtained data is available.

At the same time, the blockchain network has been trying to expand in order to improve efficiency. Ethereum’s L2 is an expansion scheme of Ethereum, thereby improving the throughput of Ethereum. However, L1 and L2 are essentially two networks. L1 will not participate in the consensus of L2, nor will it verify and execute L2 transactions. Similarly, L2 will not participate in the consensus of L1, nor will it verify and execute L1’s transactions. trade. But at this time, there is actually a trust problem between L1 and L2. For example: Rollup requires all transaction data to be recorded in Ethereum transactions, then Rollup users in order to verify whether their transactions are deposited in Ethereum, he Still need to run an Ethereum full node?

From the current working mechanism of the blockchain, we can know that when a node does not participate in the consensus, especially when it does not store all transaction data, it cannot verify whether the data obtained by itself is valid or not. You can only trust that the consensus nodes you connect to will not deceive yourself, or connect a few more consensus nodes to make a small fault tolerance.

Therefore, the problem solved by the DA layer is that the transaction can still be verified without participating in the consensus and without storing all transaction data, thereby proving whether the transaction is available.


First of all, what is DA is introduced above. Next, let’s take a look at how the Celestia project intends to solve this problem.

The Celestia project revolves around two-dimensional Reed-Solomon erasure codes, and designs a set of random sampling to verify data and restore data to ensure data availability.

When a full node finds that the light node has received problematic data, it will construct a fraud proof and send it to the light node. After the light node receives the fraud proof, it will obtain the required data from the network by random sampling, to Verify that the fraud proof is valid, so that you can clearly know whether the data you have obtained before is available. Light nodes do not need to trust the nodes that send data to themselves, nor do they need to trust nodes that send fraud proofs to themselves. This is because light nodes obtain the data required for this verification by random sampling, so the security performance is provided by the entire network. This also enables the security level of the DA layer to be close to the security level of the consensus layer.

Next, let’s take a look at how Celestia works in detail. Since the Celestia project is still in the development and testing stage, the introduction scheme of the white paper at this stage is adopted here, which may be different from the actual solution.


The verification of fraud proofs must be efficient, and does not require all transaction data or specific transactions to be executed. Therefore, Celestia has made some extensions to the data of its own blocks.

1. stateRoot

The root of a state’s sparse Merkle tree, the leaf nodes of such a Merkle tree, is a key-value pair.

A variable is defined, 状态见证(w): is a set of key-value pairs and their proofs in the Merkle tree:

Parse Celestia and DA

A function is defined, rootTransition: through the state root, transaction, and the state witness of these transactions, the root of the state after the transaction is executed can be obtained by transition. That is, the Merkel root stateRoot` of the state after each transaction is executed can be obtained by rootTransition(stateRoot, t, w)

Parse Celestia and DA

2. dataRoot

Combines transactions, and the intermediate state roots of those transaction executions, into a fixed size and fixed format shares. All these transactions shares, according to the two-dimensional RS erasure code, are extended, and finally get a root of a Merkle tree, ie dataRoot.

  • Specific steps
    • Encapsulate the initial transaction data according to sharesthe size and format of .
    • Put sharesinto a k×k matrix, if the number is not enough, fill it up.
    • Then apply the RS erasure code, and perform 3 padding according to the row and column, and finally get a 2k⋅2k matrix.
    • For each row and column of this matrix, a Merkle tree is constructed, resulting in 2⋅k row roots and 2⋅k column roots.
    • Finally, these 4⋅k roots are formed into a Merkle tree to get the root dataRoot.

Parse Celestia and DA

Parse Celestia and DA

  • shares

    sharesis a fixed-size and formatted data structure defined by the Celestia project. The main thing is transactions, and the intermediate state roots that execute those transactions.

    Since there is no specific regulation on how many transactions, the corresponding intermediate state root needs to be generated. The project party has set a Periodvariable as the maximum limit period. This limit can be the maximum number of transactions that must generate the intermediate state root, or the number of bytes. Or how much GAS.

    Two functions are also defined to help with validation:

    parseSharesFunction: Input shares, get message m, which can be an intermediate state root or a transaction.

    parsePeriodFunction: input message, get pre-state root, post-execution state root, and transaction list.

    • Fixed 256 bytes
    • 0-80: trades started
    • 81-170: Transactions Included
    • 171-190: Intermediate State Roots
    • 191-256: The next batch of transactions started
    • Example of setting format

In the white paper, two kinds of fraud proofs are introduced, which will be introduced separately below:

3. Invalid Fraud Proof of State Transition

This stateRootis . Full nodes use dataRootin sharesto help light nodes verify whether the received block header stateRootis valid.

The composition of a fraud proof with an invalid state transition:

  • blockhash of the corresponding block
  • relatedshares
  • sharesThese dataRootMerkle proofs in the corresponding Merkle tree
  • These sharescontain transactions 状态见证.

Verification of proof:

  • Verify the blockhash to determine which block is the fraud proof.
  • Verify that sharesthe proof is valid.
  • Through sharesthe two analytical functions of , the corresponding transaction list, as well as the pre-execution state root and the post-execution state root of this batch of transactions can be correctly obtained. And if the state root is empty before execution, the first transaction must be the first transaction of the block; at the same time, if the state root after execution is empty, the last transaction must also be the last transaction of the block.
  • According to the rootTransition function, to verify the two state roots obtained.

4. Fraud proofs that incorrectly generate extended data

This is sharessharesresponse to a fraud proof to the network when a full node receives recovered data from the network that does not match its own data during network propagation.

Composition of fraud proofs that incorrectly generate extended data:

  • The Merkle root of the wrong row or column.shares
  • The Merkle root of this row or column is the Merkle proof in the dataRootcorresponding Merkle tree.
  • This is enough to restore the row or column shares. (greater than or equal to k)
  • Each Merkle proof sharesin the dataRootcorresponding Merkle tree.

Verification of proof:

  • Verify the blockhash to determine which block is the fraud proof.
  • Verify that a Merkle proof that proves the Merkle root of a row or column is valid. Note: VerifyMerkleProof (Merkle root of row or column, Merkle proof of Merkle root of row or column, dataRoot, length, position index) The first two data are the data carried by the proof, and the last three are local (previously received) data.
  • Verify that sharesthe proof is valid. Note: VerifyShareMerkleProof( sharessharesMerkle Proof, dataRoot, Length, Position Index) where dataRootis the local data, and other data are obtained from the proof.
  • With received shares, restore all the data of this row or column, and verify that its Merkle root is equal to the Merkle root of the corresponding row or column that it received before.

data availability

Through the 2-dimensional RS erasure code, Celestia’s light nodes obtain block data through random sampling, and verify the relevant data of fraud proofs. At the same time, the data is randomly sampled and spread in the network. When it reaches a certain number, it can also help the network to restore the block data. The specific workflow is as follows:

  1. A light node gets the block header of a new block, and the Merkle root of 2k rows and 2k columns from any of the connected full nodes. Use these Merkel roots and block headers dataRootfor preliminary verification. Reject the block header if false.
  2. In this 2k × 2k matrix, the light node randomly selects a set of non-repeating coordinates, and sends these coordinates to the full nodes connected to itself.
  3. If a full node has all the data corresponding to these coordinates, it will respond to the light node with the Merkle proof of theshares row or column corresponding to this coordinate.shares
  4. For each received shares, the light node will verify whether its Merkle proof is valid. Note: VerifyMerkleProof ( sharessharesMerkle proof of the row or column, Merkle root, length, coordinate position index of the corresponding row or column) The first two data are the data carried by the proof, and the last three are local (received before The data.
  5. If a full node does not respond to a certain coordinate shares, the light node will send the corresponding received sharesand its Merkle proof to the full node, and the full node will also forward the received data to the connected other full nodes.
  6. If there is no problem with the verification in step 4, and the coordinates sampled in step 2 have received responses, and no fraud proofs about this block have been received within a set period of time, the light node considers this block data is available.

Posted by:CoinYuppie,Reprinted with attribution to:
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.

Like (0)
Donate Buy me a coffee Buy me a coffee
Previous 2022-06-10 23:48
Next 2022-06-11 06:24

Related articles