Scroll Research: Design Challenges and Solutions for zkEVM

Overview

zk-Rollup is a very cheap and secure layer 2 scaling solution for Ethereum. However, existing zk-Rollups are limited to specific applications, which makes it difficult for developers to build general composable DApps in zk-Rollups and migrate existing applications. We build a fully EVM compliant zk-Rollup by introducing zkEVM to generate zk proofs for EVM verification, to which any Ethereum application can easily migrate.

This paper introduces the design challenges and feasibility of zkEVM, and proposes a detailed scheme for building zkEVM from scratch.

background

zk-Rollup is recognized as the best scaling solution for Ethereum. Not only has the security of Ethereum Layer 1, but also has the fastest transaction speed compared to all other Layer 2 solutions.

In the medium to long term, as ZK-SNARK technology improves, ZK rollups will win in all use cases. — Vitalik Buterin

The basic idea of ​​zk-Rollup is to aggregate a large number of transactions into a Rollup block and generate concise proofs for off-chain blocks. Then the smart contract on Layer 1 only needs to verify the proof of zk-Rollup and update the state directly without re-executing those transactions. Proving the verification state is much cheaper than the gas cost of re-executing the calculation. In addition, data compression (that is, only keeping the minimum data on the chain for verification) is beneficial to reduce the gas cost. Such a transaction process saves an order of magnitude in gas costs.

Although zk-Rollup is safe and efficient, it is difficult to build a general DApp, and its application is still limited to payment and swap (swap), mainly for the following two reasons:

First, developing DApps in zk-Rollup requires the use of a special programming language (ie R1CS) to write the logic of smart contracts. The programming language has a complex syntax and requires developers to be proficient in zero-knowledge proofs.

Second, the current zk-Rollup does not support composability [1]. This means that multiple zk-Rollup applications cannot interact with each other within Layer 2, greatly reducing the composability of DeFi applications.

In short, currently zk-Rollup is not developer friendly and has limited functionality. We hope to address these issues, provide developers with a rapid development experience by directly supporting native EVM verification, and support the composability of applications within Layer 2 so that existing Ethereum applications can be easily migrated to zk-Rollup.

Building Universal DApps in zk-Rollup

There are two ways to build a universal DApp in zk-Rollup.

  • Build dedicated circuits (“ASICs”) for different DApps
  • Building a generic “EVM” encoding for smart contract execution

“Circuit” refers to the representation of programs used in zero-knowledge proofs. For example, to prove that hash(x) = y, the hash function would need to be rewritten using an ASIC circuit. The circuit supports only a very limited number of computational expressions (eg R1CS only supports addition (add) and multiplication (mul)). Therefore, the process of writing a program in a circuit language is very difficult for developers, and all program logic (including if else, loops, etc.) must be built using add and mul.

The first method requires developers to design specialized “ASIC” circuits for different DApps, which requires the use of zero-knowledge proofs in the most primitive way. Reduce the cost per DApp by designing custom circuits. However, since the circuits are “static”, do not provide composability for the application, and require specialized circuit design knowledge, the development experience is poor [2].

The second method does not require any special design or circuit expertise. This advanced idea based on machine proofs allows any program to run on the CPU, so developers only need to build a general-purpose CPU circuit to verify the low-level steps. This CPU circuit is then used to verify the execution of the program. In this scenario, the program refers to the smart contract, and the CPU refers to the EVM. However, this approach has not been widely adopted in the past few years due to prohibitive costs. For example, even if the developer only wants to add a verification step, the cost of the entire EVM circuit needs to be covered. If there are thousands of steps in the execution trace, then the EVM circuit cost will be 1000 times. [3]

Recently, there has been a lot of research on optimizing zk proofs along these two approaches, including:

(i) Propose a new zero-knowledge proof-friendly language, Poseidon hash, which is 100 times more efficient than SHA256 in circuits

(ii) Generic Verifiable Virtual Machines such as TinyRAM

(iii)) More and more general optimization tricks like Plookup, and cryptographic libraries that run faster.

We previously suggested designing an “ASIC” circuit for each DApp to communicate by verifying a password. However, based on feedback from the community, we have changed our priorities to focus on the second approach, prioritizing building general-purpose EVM circuits (so-called “zkEVM”). zkEVM will support the exact same development experience as Layer 1. Rather than leaving the complexity of the underlying design to the developer, we address the efficiency issue with a custom EVM circuit design.

Design Challenges of zkEVM

zkEVM is difficult to build, and unlike TinyRAM, zkEVM is more challenging to design and implement for the following reasons:

First, the EVM has limited support for elliptic curves. Currently, EVM only supports BN254 pairing and does not directly support cyclic elliptic curves, so recursive proofs are difficult. Other proprietary protocols are also difficult to use under this restriction, unless they are EVM-compliant verification algorithms.

Second, the EVM word length is 256 bits. EVM runs on 256-bit integers (most regular VMs run on 32-64-bit integers), while zk proofs work on prime fields. Doing “mismatch field arithmetic” inside the circuit requires range proofs, which would add roughly 100 constraints per EVM step, resulting in two orders of magnitude larger EVM circuit size.

Third, the EVM has many special opcodes. EVMs differ from traditional VMs in that there are many special opcodes, such as CALL, as well as execution context and gas related error types. This will bring new challenges to circuit design.

Fourth, the EVM is a stack-based virtual machine. SyncVM ( zksync ) and Cario (starkware) are architected in a register-based model and define specific IR/AIR. A specialized compiler is required to compile the smart contract code into the new zk-friendly IR. This approach is language compliant rather than native EVM compliant, a stack-based model and direct support for native toolchains are harder to implement.

Fifth, the cost of Ethereum storage layout is too large. Ethereum storage layout is highly dependent on Keccak and MPT [4], neither of which are zk friendly types and incur high costs. For example, the circuit size of Keccak hash is 1000 times that of Poseidon hash. However, replacing Keccak with another hash would cause some compatibility issues with the existing Ethereum infrastructure.

Sixth, machine-based proofs have high costs. Even if all the above problems can be handled properly, there is still a need to find an efficient way to combine them together to obtain a complete EVM circuit. As I mentioned in the previous section, a simple opcode add can cost the entire EVM circuit.

Why is zkEVM possible?

Thanks to the tremendous progress made by researchers in this area, more and more efficiency problems have been solved in the past two years, proving the feasibility of zkEVM! The main technological advances come from the following aspects:

Use of polynomial commitments. For the past few years, most zero-knowledge proof protocols have been using R1CS, with PCP queries encoded into an application-specific trusted setup. . This situation typically overloads the circuit and makes custom optimization impossible, since the degree of each constraint needs to be 2 (bilinear pairing allows only one multiplication in the exponent). Developers using the polynomial commitment scheme can raise the constraints to any degree through a universal setup or a transparent setup, greatly increasing the flexibility of the choice of backends.

Data table query parameters and custom widgets. Optimizing data table query parameters was first proposed in Arya and then optimized in Plookup. This can omit a lot of bitwise operations for zk-unfriendly programming statements like AND, XOR, etc. Custom widgets allow developers to efficiently implement height constraints. TurboPlonk and UltraPlonk define elegant syntax to make it easier for developers to use query tables and custom widgets. This is very helpful in reducing the cost of the EVM circuit.

Recursive proofs are increasingly feasible. Recursive proofs in the past have relied on special cyclic elliptic curves (ie, MNT-curve-based constructions), which required a lot of cost. More technologies are now available to change this dependency without sacrificing efficiency. For example, Halo can pair friendly elliptic curves and use a special inner product parameter to amortize the recursion cost. Aztec shows that it is possible to perform aggregate proofs directly on existing protocols (querying the data table can reduce the cost of non-local field operations and thus the cost of the verification circuit). This approach can greatly improve the scalability of the circuit load.

Hardware acceleration is more efficient. We manufacture GPUs and ASIC/FPGA accelerators, and papers on ASIC provers have been accepted by the largest computer conference (ISCA). The GPU prover is about 5 to 10 times faster than the Filecoin implementation. This will greatly improve the computational efficiency of the prover.

How does zkEVM work and how to build it?

In addition to strong intuition and technical improvements, we need to know more clearly what we need to prove and work out a more concrete architecture. We will cover more technical details and comparisons in subsequent articles. Here, we describe the overall workflow and some key ideas.

Workflow for developers and users

Developers can use any EVM-compatible language to run smart contracts and deploy the compiled bytecode on Scroll. Users can then send transactions to interact with the smart contract. The user and developer experience will be identical to Layer 1. However, gas fees are significantly lower, and transaction orders on Scroll are instantly pre-confirmed (withdrawals take only a few minutes).

Workflow of zkEVM

Even though the apparent workflow of Layer 1 and Layer 2 are not much different, the underlying processing is completely different: Layer 1 relies on smart contract re-execution; Layer 2 relies on zkEVM circuit validity proofs.

Let’s explain in more detail how Layer 1 and Layer 2 transactions differ.

In Layer 1, the bytecode of the smart contract is stored in Ethereum storage, and the transaction will be broadcast in the P2P network. For each transaction, each full node needs to load the corresponding bytecode and execute it on the EVM to reach the same state (the transaction will be the input data).

Layer 2’s smart contract bytecode operates in the same way, but the next step is that the transaction will be sent off-chain to a centralized zkEVM node. The zkEVM then executes the bytecode and generates a succinct proof that the state has been updated correctly after applying the transaction. Finally, the Layer 1 contract will verify the proof and update the state without re-executing the transaction.

Let’s dig a little deeper into the execution process and see what zkEVM ultimately needs to prove. In native execution, the EVM first loads the bytecode, then executes the opcodes in the bytecode one by one from scratch, performing the following three sub-steps for each opcode:

(i) read elements from stack, memory or storage

(ii) perform some computation on these elements

(iii) Write the result back to the stack, memory, or storage. [5] For example, the add opcode requires reading two elements from the stack, adding them and writing the result back to the stack.

So the proof of zkEVM needs to include the following aspects corresponding to the execution process:

  • The bytecode is loaded correctly from storage (so that the virtual machine correctly runs the opcode loaded from the given address)
  • The opcodes in the bytecode are executed one by one (bytecodes are executed sequentially, no opcodes are lost or skipped)
  • Each opcode executes correctly (the three substeps in each opcode execute reads, writes and computations correctly)

zkEVM Design Highlights

When designing the architecture of zkEVM, we need to address/solve the above three issues.

First, design a circuit for the password validator. This part is like a “verifiable storage”, we use the technical means of password authenticator to ensure the correctness of the verification result. [6] Taking the Merkle tree as an example, the deployed bytecode will be stored as a leaf node in the Merkle tree. The verifier can then use the compact proof to verify the bytecode loaded from the given address (i.e. verify the Merkle path in the circuit). For Ethereum storage, circuits are required to be compatible with the Merkle Patricia Trie and Keccak hash functions.

Second, design a circuit to correlate the bytecode with the actual execution. Moving bytecodes into static circuits presents a problem: conditional opcodes like jump (corresponding to loop, if else statements in smart contracts) can jump anywhere. The jump destination is indeterminate until someone runs that bytecode with a specific input. That’s why we need to verify the actual execution trace. An execution trace can be thought of as “unrolled bytecode”, containing the opcodes in the actual execution order (i.e. if you jump to another location, the target opcode and location will be included in the trace).

The prover will directly provide the execution trace as witness data for the circuit. We need to prove that the execution trace is the work of “unrolling” through a specific bytecode with a specific input, in order to enforce a consistent value of the program counter. For the problem of uncertain destination, the solution is to let the prover provide all the data. Consistency is then efficiently checked by looking up the parameters (ie, proving that the opcode with an accurate global counter is contained in the “bus”).

Third, design the circuit for each opcode (proving that the reads, writes and computations in each opcode are correct). This is the most important part, proving that every opcode in the execution trace is correct and consistent. Putting everything directly together would come at a high cost. The optimization scheme here is:

1) We separate read and write and computation into two proofs. One proof will put all the elements used by the opcode on the “bus”, and the other proof will prove that the computation on the elements on the “bus” is performed correctly. This significantly reduces the cost of each part (the entire EVM storage need not be considered when generating proofs of computation). The former is called “state proof” and the latter is called “EVM proof”. Another finding is that lookup declarations can effectively handle “bus mapping”.

2) We can design higher degree custom constraints for each opcode (ie, we can slice an EVM word into multiple data blocks for more efficient processing). We can choose whether to “turn on” a constraint on demand via a selector polynomial. This avoids the cost of consuming the entire EVM circuit for each operation.

Originally proposed by the Ethereum Foundation, this architecture is still in its early stages and under active development. We are working closely with the Ethereum Foundation to find the best way to implement this EVM circuit. So far, we have defined the most important features of the EVM circuit and implemented some opcodes (using the UltraPlonk syntax from the Halo2 library). More details will be introduced in subsequent articles. We recommend that interested readers read this document. The development process will be transparent. This will be a fully open source design that brings together the entire community. Hope more people will join in and contribute.

What else can zkEVM bring?

zkEVM is much more than just Layer 2 scaling. We can understand it as a straightforward way to extend Ethereum Layer 1 through Layer 1 Validity Proofs. This means that existing Layer 1 can be extended without any special Layer 2.

For example, developers can use zkEVM as a full node, and this proof can be used to directly prove transitions between existing states. All Layer 1 transactions don’t require migrating anything to Layer 2, you can justify it straight away! More broadly, you can use zkEVM to generate compact proofs for the whole of Ethereum, just like Mina. The only thing that needs to be added is proof recursion (embedding the block’s verification circuit in zkEVM) [7].

in conclusion

zkEVM can provide the same experience for developers and users. It’s orders of magnitude cheaper without sacrificing security. An architecture to build it in a modular fashion has been proposed. It leverages recent breakthroughs in zero-knowledge proofs to reduce costs (including custom constraints, lookup parameters, proof recursion, and hardware acceleration). We look forward to seeing more people join the zkEVM community and brainstorm with us!

Remark:

[1]: Starkware stated in an announcement on September 1, 2021 that composability has been achieved.

[2]: The circuit is fixed and static. For example, when implementing a program as a circuit, you cannot use variable-capped loops. The upper limit must be fixed to the maximum value. The circuit cannot handle dynamic logic.

[3]: To facilitate the reader’s understanding, we detail the cost of the EVM circuit here. As mentioned earlier, circuits are fixed and static. Therefore, the EVM circuit needs to contain all possible logic (10000 times the size of the circuit containing only add). This means that even if you only want to prove add, you still need to pay the cost of all the logic that may be contained in the EVM circuit. That is, the cost is magnified by a factor of 10,000. In execution trace, you need to prove a chain of opcodes, and each opcode incurs a high cost.

[4]: The EVM itself is not tightly bound to the Merkel-Patricia tree (MPT). Currently, MPT is only used to store Ethereum state. It’s easy to change one (someone has proposed replacing MPT with a Verkle tree).

[5]: This is a highly simplified abstraction. Technically, the list of “EVM states” is longer and includes the program counter, gas headroom, call stack (all of the above plus addresses and statics for each call in the stack), a set of log and transaction scope variables (hot storage slot, refund, self-destruct). We can additionally introduce identifiers for different calling environments to directly support composability.
[6]: Due to the large amount of storage, we use an accumulator for storage. Memory and stack can use editable Plookup (we can effectively implement “RAM” this way).

[7]: Adding a complete recursive proof to a zkEVM circuit is not trivial. The best way to implement recursion is to use cyclic elliptic curves (ie, Pasta curves). We need to introduce some kind of “wrapping” process to make recursion verifiable on Ethereum Layer 1.

Posted by:CoinYuppie,Reprinted with attribution to:https://coinyuppie.com/scroll-research-design-challenges-and-solutions-for-zkevm/
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.

Like (0)
Donate Buy me a coffee Buy me a coffee
Previous 2022-05-03 09:47
Next 2022-05-03 09:48

Related articles