Zero-knowledge proofs allow one party to prove authenticity to another party without revealing any additional information. It can therefore be used to protect privacy, verifying the validity of a transaction while hiding all details. The convenience of validating zero-knowledge proofs is important for certain zero-knowledge protocols, such as STARKs and SNARKs. These protocols generate smaller proofs, and verifying such proofs is also much faster. This is a good fit for resource-constrained blockchains, and is especially important in addressing scalability issues in the crypto industry. In addition to this, other use cases for zero-knowledge technology include:
- Cross-chain bridges – use zero-knowledge proofs (ZKP) for state transitions or verify transactions, e.g. Alogrand ASP, Mystiko
- DID (Decentralized ID) – Prove that an account or entity has certain “characteristics” without revealing details, e.g. Sismo, First Batch
- Community governance – for anonymous voting, and after proven and widespread adoption, this use case can be extended to real-world governance
- Financial Statements – Entities can demonstrate compliance with certain criteria without disclosing exact financial data
- Cloud Service Integrity – Helping Cloud Service Providers Perform Better
A typical zero-knowledge system works as follows: The engineer first writes the statement to be verified in a domain-specific language (DSL), which is then compiled into a format suitable for zero-knowledge, such as an arithmetic loop. After generating parameters using this format, the attestation system uses these parameters as input values to run the attestation computation along with the secret information it witnesses. With a relatively simple calculation, the verifier can decide whether to pass the verification based on the parameters and the proof. In the case of zero-knowledge aggregation, the program or contract itself is deployed on layer2, and the compilation process, parameters and proof generation will be performed off-chain by some layer2 nodes, and then published and verified on the Ethereum mainnet.
A typical zero-knowledge system
Source: ZK Whiteboard Sessions – Module One, written by Prof. Dan Boneh
There are several excellent proof systems available, such as Marlin, Plonky2, Halo2, etc. Different proof systems have different emphasis on characteristics such as the size of the proof to generate, the time required for verification, and whether a trusted setup is required. After several years of exploration, it is possible to achieve a constant proof size (hundreds of bytes) and a short verification time (a few milliseconds) no matter how complex the statement is.
However, the complexity of proof generation scales almost linearly with the arithmetic loop size, so the difficulty may even be hundreds of times that of the original task. Since the prover needs to read and evaluate the loop at a minimum, this can take seconds to minutes, or even hours. The high cost of computing power and long proof times have been the main obstacles to the advancement and large-scale application of zero-knowledge technology.
Hardware acceleration can help break bottlenecks. Distributing multiple tasks to the most suitable hardware with the help of algorithms or software optimizations will complement each other.
This report aims to help readers understand the market landscape, the impact of zero-knowledge technology on the mining market, and potential opportunities. The report consists of three parts:
- Real-world use cases and latest trends for different projects.
- Acceleration solutions based on GPU, FPGA and ASIC.
2. Use Cases
Enumerating zero-knowledge use cases will help illustrate how the market is evolving. Because different categories have different needs, hardware supply is also involved. At the end of this section, we will also briefly compare ZKP and PoW (especially for Bitcoin).
2.1 Emerging blockchain and its differentiated needs
Currently emerging blockchains using zero-knowledge technology are the main demanders for hardware acceleration, and are roughly divided into scaling solutions and privacy-preserving blockchains. A zero-knowledge Rollup or Volition executes transactions off-chain and submits succinct proofs of verification through a “data call” function. Privacy-preserving blockchains use ZKPs to allow users to ensure the validity of initiating transactions without disclosing transaction details.
These blockchains trade off features such as proof size, verification time, trusted setup, etc. by using different proof systems. For example, Plonk generates proofs with constant proof size (~400 bytes) and verification time (~6ms), but still requires a common trusted setup. In contrast, Stark does not require a trusted setup, but its proof size (~80KB) and verification time (~10ms) are subpar and increase with loop size. Other systems have their own pros and cons. The trade-off between these proof systems results in a shift in the “center of gravity” of computation.
Specifically, the current proof system can usually be described as PIOP (polynomial interactive oracle proof) + PCS (polynomial commitment scheme). The former can be thought of as an agreed procedure used by the prover to convince the verifier, while the latter uses mathematical methods to ensure that the procedure cannot be broken. It’s like PCS is a gun and PIOP is a bullet. The project party can modify the PIOP as needed, and can choose among different PCS.
In his report on hardware acceleration, Paradigm’s Georgios Konstantopoulos explained that the time required to generate a proof depends mainly on two types of computational tasks: MSM (multi-scalar multiplication algorithm) and FFT (fast Fourier transform). However, instead of using fixed parameters, establishing different PIOPs and choosing from different PCSs will result in different computational costs for FFT or MSM. Taking Stark as an example, the PCS used by Stark is FRI (Fast Reed-Solomon Code Proximity Interaction Proof), which is based on Risode code, not the elliptic curve used by KZG or IPA, so it is completely incompatible in the entire proof generation process. Involves MSM. We have roughly sorted the calculation amount of different proof systems in the table below. It should be noted that 1) it is difficult to estimate the exact calculation amount of the entire system; 2) the project party usually modifies the system as needed during implementation.
Computational amount of different proof systems
The above scenarios will allow project parties to have their own hardware type preferences. GPUs are currently the most widely used due to their large supply and ease of development. In addition, the multi-core structure of GPU is very convenient for parallel MSM computation. However, FPGAs (Field Programmable Gate Arrays) may be better at handling FFTs, which we will detail in Part II. Projects that use Stark, such as Starknet and Hermez, may require more FPGAs.
Another conclusion from the above is that the technology is still in its early stages and lacks a standardized or dominant solution. It may also be premature to fully use ASICs (application-specific integrated circuits) dedicated to specific algorithms. Therefore, the developers are exploring a middle ground, which we will explain further later.
2.2 Trends and new paradigms
2.2.1 More complex statements
Drawing on the use cases listed at the beginning, we expect zero-knowledge to have many more uses in the crypto industry and the real world, and enable more complex proofs, some of which don’t even have to adhere to current proof systems. Instead of adopting PIOP and PCS, project parties can develop new primitives that are most suitable for them. In other fields such as MPC (Secure Multi-Party Computation), the use of zero-knowledge protocols in part of the work will greatly improve its utility. Ethereum also recently planned to hold a KZG trusted setup ceremony in order to realize Proto-Danksharding, and plans to further implement a full version of Danksharding in the future to handle data availability sampling. Even Optimistic Rollups have the potential to adopt ZKPs in the future to improve security and shorten dispute resolution times.
While many may see zero-knowledge as a separate sector within the broader crypto industry, we believe zero-knowledge should be viewed as a technology that addresses multiple pain points in the industry. Conversely, in order to provide services to different systems and customers, hardware acceleration will be more flexible and versatile in the future.
2.2.2 Proof of Local Generation
ZKPs used to protect privacy and ZKPs used to compress information differ significantly in structure. In order to hide transaction details, some random numbers are involved in the proof process. Users need to generate proofs locally, but most users do not have advanced hardware. To make matters worse, if most dapps are still web apps, proofs need to be generated in the browser, which will take longer to prove. For example, when Manta tried to build a high-performance prover for WASM, they quickly realized that “WASM inflicts a 10-15x performance penalty on users compared to local processing speed”. To solve this problem, Manta chose to become the sponsor and architect of ZPrize, one of the largest ZKP acceleration competitions, and Manta set up a WASM acceleration exclusive track. Providing a client-side version is an easy solution for this type of dapp, but the need to download may cause some potential users to churn, and the client-side does not work with current extension wallets or other tools.
Another solution is to partially outsource proof generation. Pratyush Mishra presented this approach during the 7th Zero Knowledge Summit. The user first performs some light computations, then sends public statements and encrypted witnesses to several third parties who then complete the remaining proofs. In this way, as long as one of the parties is honest, the user’s privacy will not be leaked. This approach combines zero-knowledge protocols and some of the tools used by MPC. Alternatively, users can use bandwidth for computation: first generate a large data proof, then send it to a third party, who compresses the proof and publishes it on-chain.
Outsourced proof generation
Source: 7th Zero Knowledge Summit, presented by Pratyush Mishra of Aleo
2.3 Comparison with PoW mining
While it would be natural to think of ZKP as a novel form of PoW and to see accelerated hardware as a new type of mining machine, ZKP generation is fundamentally different from PoW mining in terms of purpose and market structure.
2.3.1 Power Competition and Utility Computing
In order to earn block rewards and transaction fees, Bitcoin miners keep iterating through random numbers to find a sufficiently small hash value, which is really only relevant for reaching consensus. In contrast, ZKP generation is a necessary process to achieve practical utility such as information compression or privacy protection, without being responsible for consensus. This distinction affects ZKP’s potentially broad participation and reward distribution model. Below we list three existing designs to illustrate how miners will coordinate ZKP generation.
- Rates-are-Odds (Aleo): Aleo’s economic model design is the closest to Bitcoin and other PoW protocols. Its consensus mechanism PoSW (Proof of Concise Work) still requires miners to find a valid random value, but the verification process is mainly based on the repeated generation of SNARK proofs, which take the random value and the hash value of the state root as input. The proof hash value generated in a certain round is small enough. We refer to this PoW-like mechanism as the Rates-are-Odds model because the number of validations that can be processed per unit time roughly determines the probability of getting a reward. In this model, miners increase their chances of earning rewards by hoarding large numbers of computing machines.
- Winner-Dominates (Polygon Hermez): Polygon Hermez adopts a simpler model. According to their public documentation, the two main players are the orderer and the aggregator, the orderer collects all transactions and preprocesses them into new L2 batches, and the aggregator clarifies its verification intent and competes to generate proofs. For a given batch, the first aggregator to submit a proof will earn what the orderer pays. Aggregators with state-of-the-art configurations and hardware are likely to dominate, regardless of geographic distribution, network conditions, and validation policies.
- Party-Thresholds (Scroll): Scroll describes their design as “Layer 2 Proof Outsourcing” where miners who stake a certain amount of cryptocurrency will be arbitrarily chosen to generate proofs. The selected miners need to submit the proof within the specified time, otherwise their probability of being selected for the next epoch will be lowered. Generating an incorrect proof will result in a penalty. At first, Scroll may work with a dozen or so miners to improve its stability and even run its own GPU. And over time, they plan to decentralize the process. We use this time node of implementing decentralization as a parameter to measure the adjustment of Scroll’s center of gravity between efficiency and decentralization. Starkware may also fall into this category. In the long run, only machines with the ability to complete the proof in time can participate in proof generation.
Each of these coordinated designs has a different focus. We expect Aleo to have the highest decentralization, Hermez to have the highest efficiency, and Scroll to have the lowest threshold for participation. But according to the above design, a zero-knowledge hardware arms race is unlikely to happen right away.
2.3.2 Static Algorithms and Evolutionary Algorithms
Another difference is that Bitcoin is based on a single, relatively static algorithm. The core developers of Bitcoin have always tried to follow the original design and spirit in order to keep the network stable and avoid serious forks. Emerging blockchains or projects do not have such legacy constraints, which allows them to tune their systems and algorithms more flexibly.
We believe that the differentiation of ZKP contributes to a more fragmented and dynamic market structure compared to the simple and static PoW market. We propose to think of ZKP generation as a service (some startups have named it ZK-as-a-Service), and ZKP generation is a means to an end, not an end. This new paradigm will eventually lead to new business or revenue models, which we will detail in the final section. Before that, let’s take a look at multiple solutions.
The CPU (Central Processing Unit) is the main chip in a general-purpose computer and is responsible for distributing instructions to various components on the motherboard. However, since CPUs are designed to process multiple tasks quickly, which limits the processing speed, GPUs, FPGAs, and ASICs are often used as an aid when dealing with concurrency or some specific tasks. In this section, we will focus on their characteristics, optimization process, status quo, and market.
3.1 GPU: Currently the most commonly used hardware
The GPU was originally designed to manipulate computer graphics and process images, but its parallel structure makes it a good choice in areas such as computer vision, natural language processing, supercomputing, and PoW mining. GPUs can accelerate MSM and FFT, especially for MSM, by utilizing an algorithm known as “pippenger”, the process of developing GPUs is much simpler than FPGAs or ASICs.
The idea of accelerating on the GPU is very simple: move these computationally demanding tasks from the CPU to the GPU. Engineers will rewrite these parts into CUDA or OpenCL. CUDA is a parallel computing platform and programming model developed by NVIDIA for general-purpose computing on NVIDIA GPUs. CUDA’s competitors are developed by Apple and the Khronos Group for heterogeneous computing. Provides standard-built OpenCL, which frees users from being limited to NVIDIA GPUs. These codes are then compiled and run directly on the GPU. For even further acceleration, aside from improving the algorithm itself, developers can also:
(1) To reduce data transfer costs (especially between CPU and GPU), optimize memory by using as much fast storage as possible and less slow storage.
(2) In order to improve hardware utilization and make the hardware work as full as possible, the execution configuration is optimized by better balancing work among multiple processors, building multi-core concurrency, and rationally allocating resources to tasks.
In short, we do everything we can to parallelize the entire process of work. At the same time, the sequential execution process in which the latter item depends on the result of the former item should be avoided as much as possible.
Save time by parallelizing
GPU-accelerated design process
3.1.1 Huge developer group and convenience of development
Unlike FPGAs and ASICs, GPU development does not involve hardware design. CUDA or OpenCL also has a huge developer base. Developers can quickly build their own modified versions based on open source code. For example, Filecoin released its first GPU-equipped network back in 2020. Supranational also recently open sourced their Universal Acceleration solution, which is currently probably the best open source solution of its kind.
This advantage is even more pronounced when considering work other than MSM and FFT. Proof generation is indeed mainly dominated by these two, but the other parts still account for about 20% (source: Sin7Y’s white paper), so only accelerating MSM and FFT has limited effect on reducing proof time. Even if the computation time of these two terms is compressed into an instant, the total time taken is only one-fifth of the original time. Furthermore, since this is an emerging and evolving framework, it is difficult to predict how this ratio will change in the future. Given that FPGAs need to be reconfigured, and ASICs may also need to be redesigned for production, GPUs are more convenient for accelerating heterogeneous computing work.
3.1.2 Excess GPU
Nvidia dominates the GPU market. According to Jon Peddie Research, Nvidia’s discrete GPU shipments accounted for 78% of the market in the first quarter of 2022. While many graphics cards are priced significantly higher than the MSRP (manufacturer’s suggested retail price), the availability of graphics cards continues to increase. In 2021, more than 50 million GPUs (valued at $52 billion) will be shipped. From this figure, this is almost 8.5 times the sales of FPGAs during the same period.
GPU chip market share
Source: Jon Peddie Research
For mining in particular, we conservatively estimate that after the Ethereum merger, about 6.26 million GPUs will be liberated from Ethereum PoW mining. Assuming that the vast majority of Ethereum’s hash rate comes from GPUs, we multiply Ethereum’s current hash rate (890 Th/s) by 90% and divide the resulting number (801 Th/s) by state-of-the-art GPUs The mining capacity of the graphics card RTX 3090 Ti (128 Mh/s), so we can get our conservative estimate of the number of GPUs to be 6.26 million. Since ASICs dominate Bitcoin mining, and no other project using PoW can accommodate such a large amount of idle mining power, it is worthwhile to turn these soon-to-be-idle GPUs to zero-knowledge proof services other than mining Ethereum forks or providing cloud services Explore options.
Ethereum hash rate
3.2 FPGAs: Balancing Cost and Efficiency
An FPGA is an integrated circuit with a programmable structure. Because the circuitry inside an FPGA chip is not hard-etched, designers can reprogram it as many times as they need to for specific needs. On the one hand, this effectively cuts the high manufacturing cost of ASICs. On the other hand, its use of hardware resources is more flexible than GPUs, making FPGAs have the potential to further accelerate and save power. For example, while it is possible to optimize FFTs on the GPU, shuffling the data frequently results in a large amount of data transfer between the GPU and the CPU. However, shuffling is not completely random, and by writing intrinsic logic directly into the circuit design, FPGAs promise to perform tasks faster.
To implement ZKP acceleration on an FPGA, several steps are still required. First, a reference implementation of a specific proof system written in C/C++ is required. Then, in order to describe the digital logic circuit at a higher level, this implementation needs to be described in HDL (hardware description language).
Then you need to simulate debugging to display the input and output waveforms to see if the code works as expected. This step is the one that involves the most implementation. The engineer does not need the whole process, but can identify some minor errors simply by comparing the two outputs. The synthesizer then converts the HDL into an actual circuit design with elements such as gates and flip-flops, and applies the design to device architecture and more analog analysis. Once the circuit is confirmed to function properly, a programming file is finally created and loaded into the FPGA device.
FPGA Design Flow
3.2.1 Current barriers and underdeveloped infrastructure
While it is possible to reuse some of the module optimization work on the GPU, it also faces some new challenges:
(1) For higher memory safety and better cross-platform compatibility, most open-source implementations of zero-knowledge for a long time are written in Rust, but most FPGA development tools are written in C/C++, which is more familiar to hardware engineers of. Teams may have to rewrite or compile these implementations before implementing them.
(2) When writing these implementations, software engineers can only choose code from a limited range of C/C++ open source libraries that can be mapped into the hardware architecture with existing development support.
(3) In addition to the work that software engineers and hardware engineers can do independently, they also need their close cooperation to complete some deep optimizations. For example, some modifications to the algorithm will save a lot of hardware resources while guaranteeing that it will play the same role as before, but this optimization is based on an understanding of hardware and software.
In short, unlike AI or other mature fields, engineers must learn and build from scratch to achieve ZKP acceleration. Fortunately, we are seeing more progress. For example, Ingonyama proposed PipeMSM in their recent paper, a method to accelerate MSM on FPGA or ASIC.
3.2.2 Duopoly market
The FPGA market is a classic duopoly. According to Frost & Sullivan, Xilinx (acquired by AMD in February 2022) and Altera (acquired by Intel in December 2015) together accounted for about 85% of the global FPGA market shipments in 2019. Early access to state-of-the-art FPGAs may require a close relationship with Intel or AMD. In addition, zero-knowledge as an emerging field has caught the attention of industry giants. AMD is one of ZPrize’s technology providers.
The FPGA market is a classic duopoly
Source: Frost & Sullivan
Engineers have realized that a single FPGA cannot provide enough hardware resources for complex ZKP generation, so multiple cards must be used simultaneously for verification. Even with a sound design, existing standard FPGA cloud services from AWS and other vendors are not ideal. Also, startups offering acceleration solutions are often too small to have AWS or other companies hosting their custom hardware, and they don’t have the resources to run their own servers. Partnering with a large miner or with a Web3 native cloud service provider might be a better option. However, the partnership could be delicate considering that the mining company’s in-house engineers will also likely be developing accelerated solutions.
3.3 ASIC: The Ultimate Weapon
An ASIC is an integrated circuit (IC) chip that is specially tailored for a specific purpose. Typically, engineers still use HDL to describe the logic of an ASIC, in a similar way to using an FPGA, but eventually circuits are permanently drawn into silicon, where circuits are made by connecting thousands of configurable blocks of. Rather than sourcing hardware from Nvidia, Intel, or AMD, companies have to figure out how to do it themselves, from circuit design to manufacturing and testing. ASICs will be limited to certain specific functions, but instead this gives designers the greatest degree of freedom in resource allocation and circuit design, so ASICs have great potential in terms of performance and energy efficiency. Designers can eliminate waste in space, power and functionality by simply designing the exact number of doors or sizing different modules for the intended application.
In terms of design flow, compared to FPGA, ASIC requires pre-tape verification (and DFT) between the two steps of HDL writing and integration, and requires floorplanning before implementation. The former is where engineers use sophisticated simulation tools to test designs in a virtual environment, and the latter is used to determine the size, shape and location of modules within a chip. After the design is realized, all files will be sent to foundries such as TSMC or Samsung for test tapeout. If the test is successful, the prototype is sent for assembly and inspection.
ASIC Design Flow
3.3.1 Relatively general ASICs in the zero-knowledge field
A common criticism of ASICs is that once the algorithm changes, previous chips are completely useless, but that doesn’t have to be the case.
Coincidentally, none of the companies we spoke to that planned to develop ASICs were putting all their efforts on a particular proof system or project. Instead, they prefer to develop some programmable modules on the ASIC, through which they can cope with different proof systems, and only assign the MSM and FFT tasks to the ASIC. This is not optimal for a specific chip for a specific project, but in the short term sacrificing performance for better generality may be a better option than designing for a specific task.
3.3.1 Expensive but non-recurring costs
Not only is the design process for ASICs much more complicated than for FPGAs, but the manufacturing process also consumes more time and money. Startups can contact foundries directly for tapeout or through distributors. It may take about three months or more to actually start executing. The main cost of tape-out comes from reticle and wafer. A reticle is used to form patterns on a wafer, which is a thin piece of silicon. Startups typically choose MPWs (multi-project wafers), which can share the cost of reticle and wafer manufacturing with other project parties. However, depending on the process they choose and the number of chips, tape-out costs are still conservatively estimated to be in the millions of dollars. Tape out and assembly and testing are still months away. If feasible, we can finally start preparing for mass production. However, if anything goes wrong with the test, debugging and failure analysis will take an incalculable amount of time and tape out again. It takes tens of millions of dollars from initial design to mass production, and it takes about 18 months. The consolation is that a significant portion of these costs are non-recurring costs. In addition, ASICs have high performance and save energy and space, which are both important, and can be relatively inexpensive.
Below we provide a general evaluation of different hardware solutions.
For a more intuitive understanding of the available business models, we have presented all potential market players in the chart below. Since there may be cross-relationships or complexities between actors, we only categorize them by function.
Hardware accelerated functional layer
In addition to developing GPU or FPGA chips, startups can also enter the zero-knowledge realm from any of the above functional layers. There is an option to design and build an ASIC from scratch and package the chips into specialized equipment to sell to miners, or the bare chips can be sold to downstream suppliers for assembly. Startups can also choose to build their own servers, participate in proof generation or provide cloud services. Alternatively, you can choose to become a consulting firm that provides design solutions but is not involved in the actual operation. Full-stack solutions from hardware resources to custom system design can also be provided for zero-knowledge applications, provided the company has strong partnerships or sufficient resources to cover the entire value chain.
Zero-knowledge has not yet achieved large-scale applications, and building accelerated solutions will also be a long process. We look forward to the turning point in the future. The key question for builders and investors is when this tipping point will come.
Special thanks to Weikeng Chen (DZK), Ye Zhang (Scroll), Kelly (Supranational) and Omer (Ingonyama) for helping us understand all the technical details. Thanks also to Kai (ZKMatrix), Slobodan (Ponos), Elias and Chris (Inaccel), Heqing Hong (Accseal) and many others for their insights into this research.
Posted by:CoinYuppie，Reprinted with attribution to:https://coinyuppie.com/amber-group-a-comprehensive-interpretation-of-zero-knowledge-proofs/
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.