How did parallel chains go from research papers, to code implementations?
How will parallel threads, nested relay chains, etc., evolve in the future?
What are the limits of parallel chain scalability?
Roundtable discussion at 2021 Polkadot Decoded “Parsing Parallel Chains: Who Made Them? What is a parallel chain? Why have parallel chains?” PolkaWorld summarizes the main points of the roundtable in this article.
Jeff: Jeff Burdges, a W3F cryptography researcher who has done a lot of research on parallel chain development
Rob: Robert Habermeier, Polka co-founder/Parity core developer, leading the implementation team to make parallel chains work in practice
Joe: Joe Petrowski, Head of Technical Integration at W3F, moderator of this roundtable
Joe: About a year and a half ago, Jeff led a team that published a paper on usability and effectiveness. By the time it was implemented, the program had changed a lot. Jeff, can you talk briefly about how the idea for this paper came about?
Jeff: We got some ideas from the ethereum ecosystem, like the idea of using corrective codes. But some of the specific things around those ideas, like how to optimize sharding, were not formalized.
In late 2019, we decided we wanted to formalize these ideas and come up with a more precise approach. In general, the way we design complex protocols is that I will write down all possible design options and then use the elimination method.
In early 2020, I wrote down the design and discussed it with the group, and we noticed some problems and I came up with a technique called “Two-phase inclusion”. That means that before it really starts, the parallel chain relay chain must know a block and the verifier must say that the block is valid.
After that we start doing the corrective coding and then start doing the real work of checking it. One of the benefits of doing this is that because someone has a lot of tangibility invested in the process, it will thus limit the number of times it can be tried. And thus do it in such a way that if you were to attack it, then you would also destroy yourself. It’s not cryptographic security, it’s distributed system security, but it’s reasonable.
Joe: For people who are not familiar with sharding what was just said might be a little abstract. It’s really, we have a thousand verifiers, and when you want to include one of these parallel chain blocks, you actually need to send the data block to all the verifiers, and there’s expense, complexity, network, storage, etc. involved in getting everybody to have to deal with that message, so you want to really make sure that those messages are valid messages, and that they’re justified.
Jeff: Yeah. Corrective codes are actually pretty old, and there are different types of corrective codes. But generally speaking when if you’re using cryptography, you’re generally using something based on Lagrangian interpolation or Reed-Solomon codes or something like that. The reason is that it has a steep threshold, so we can recover the full picture from any third of the fragment.
So how do we do that? We have parallel chain blocks, called candidate blocks, and we have 3f+1 verifiers. Then we code-correct these into 3f+1 fragments, and you can reconstruct the original block as long as you have any f+1 fragments. That is, with just a little bit more than 1/3 of a fragment, you can reconstruct the original block.
This is a very old mathematical method that actually makes us faster. Based on our current number of verifiers, we had to find some relatively new papers to optimize. That’s what we did this year — went and greatly optimized the corrective coding. We made it run 400 times faster, and asymptotically it’s actually faster. It went from an O(N²) algorithm to an O(log n) algorithm. This makes the computation less burdensome. We’ll probably do even better after that. This is one of our recent breakthroughs, but of course it would have been nice if we had solved it earlier hahaha.
Joe: Turning all this research into code has actually been a big challenge. We went live with the Rococo test network in the middle of last year, can Rob talk about some of the challenges early on in the implementation of this protocol?
Rob: I remember the first code commit related to parallel chains was in the second half of 2018, and in mid-2019 we had the first draft of what we call the V0 protocol, and in the first few years we were more invested in BABE, GRANDPA consensus, which is the block out and block confirmation, and we didn’t really dive into the parallel chain stuff at that point because the parallel The parallel chain part is more complex and requires more development time.
From mid to late 2019 to early 2020 or so, things progressed a lot, and as Jeff just mentioned, the research team started to really get the protocols down, like the availability made sure that the parallel chain blocks were still there so that other people could check them, to do additional checks for security.
I think it’s actually very difficult to implement all of this research. If you’re building any kind of system, then every little bit of extra complexity you add to the system, the time it takes to create that system increases exponentially. The same rule applies to code, because once you reach a certain amount of code, it’s actually really hard to add more stuff because the new stuff is bound to disrupt and break some of the stuff that was done well before.
So it’s important to have a good design and planning, and we definitely do some research back and forth as we iterate through the protocol. But in 2020, we focused on the Implementer’s Guide and iterated in that rather than in the code. I was able to talk to Jeff and Al (Alistair) to discuss what was in the draft paper and then write a page that said, “This is how we would write the code,” instead of just going straight to the code, and we saved a couple of weeks with that approach, and then after that I could assign that code writing work to a lot of developers.
So I think it’s important to have a good plan when you’re building a system like this. And also having a modular system so that you can add a separate section of code that you can organize into smaller packages rather than a whole system, because it’s hard for one person to handle a whole ponderous system.
Joe: Speaking of the current phase. Now that Kusama is live with Shell blank chains, there are already 12 parallel chains on Rococo, but Kusama’s block-out time is around 12 seconds, and we’re working on fixing that. What are the challenges we face in the short term to get the block out time up to 6 seconds and to bring more chains online on Kusama?
Rob: I think it all essentially comes down to the network. kusama has 900 verifiers, and those verifiers are some of the people who have KSMs and have nodes deployed around the world to synchronize the chain. It’s a cool thing, it’s probably one of the largest sets of verifiers in the world.
But when you add some complexity to this network, like adding parallel chains, it definitely adds a lot of load. We’ve actually tested this before on Rococo with the same parameters, but it works completely differently on Kusama, where the verifier nodes run all over the world, so the main challenge is to make the network code run as smoothly as possible. When we wrote the network code, we did a lot of anti-cheating mechanisms, the kind of thing where you don’t even notice it’s there if no one is messing with it, but if someone is being nefarious, you find these defense mechanisms very important.
Jeff: That’s right. As we add more and more parallel chains, there’s going to be more computational load, and we’ll see how that evolves in time and how we grow in the process. It’s actually an incremental learning process to observe how these operations will affect the network.
Joe: That’s what Kusama exists for, isn’t it?
Rob: That’s right. As parallel chains become more numerous, the load on the verifier will definitely become greater. Because the verifier has to verify a block and has to pledge their coins behind the block, some of the other verifiers will self-select to do the checks. The more parallel chains you have, the more computations you will have to do, although the computations should increase slower than the number of parallel chains, which is why this network is scalable and not non-scalable like some other blockchains. But as a verifier, you may still need to verify dozens of blocks in every second.
Joe: Let’s talk a little bit more practically, about what’s planned for the next year for Boca and Kusama. We have a plan for parallel threads, which you can actually see inside the UI now, because the chains are registered as parallel threads until they are upgraded to parallel chains. But after that we will make parallel threads more practical. Can you talk about the design of parallel threads, the implementation, and what is left to be done to implement it?
Rob: Parallel threads are similar to parallel chains, but the main difference is that they are scheduled differently. We have a scheduling process where if you’re a parallel chain, then each block is scheduled; if you’re a parallel thread, then you need to have an auction where parallel thread collectors can compete with each other for the right to write to the block. This brings about a network change on the collection side, and when you’re a parallel threaded block writer, you need to let the verifier know that you have blocks to commit.
So in general the challenges are threefold: the scheduler, the auction, and the network changes.
Jeff: Actually, there were various designs for parallel threads, but in the end we chose the one with the auction. Because this design prevents cheating better, but for parallel threads, they can lose resources if they can’t commit blocks for some reason. So we’ll have to look at some more economic aspects.
Joe: We’ve talked about releasing some of the core functionality from the relaychain and dropping it into the parallel chain to further enable scalability, and the idea of implementing a Nested relaychain. Do you guys want to talk about why you want to do this?
Jeff: I actually prefer to call it “relaychain fragmentation” rather than “nested relaychain” because nested relaychains sound like one chain is dominant.
In a sense, relay chain slicing is a bit easier than what we already do now (parallel chain slicing). But I think it will probably come at a time when there are more than 3,000 validators to do this, and I want to tell you that there is no need to rush to implement it yet. Until then, we want to make the relay chains as simple as possible first, which I think is the least amount of work for developers.
Rob: At the moment, the Staking and Election modules and some of the governance functions are actually quite heavy and can put a large load on the relay chain. Everything that happens on the relay chain, the relay chain verifier needs to perform. Whereas, by design, things that happen on the parallel chain only need to be handled by a subset of the verifiers. So that’s where scalability comes from, making the things that each verifier machine needs to perform as few as possible.
I think it’s actually pretty hard to safely distill out things like Staking and governance. Because there are failure modes in polka, like one mechanism is that the chain might be blocked out of the block when you’re doing dispute analysis. You might not be able to do a Slash transaction, the verifier set might not be able to update the transaction, etc. These are all tricky challenges.
But this one is not really that urgent. Until then, we should optimize the node side, like how to handle parallel chains and network messages, to get higher scalability, and run more parallel chains.
Jeff: I think our goal should be, although this may not be achievable, the goal should be to get to the same level, so that each parallel chain has a verifier. This may not be achievable, but when we get to this situation, we should know that this limit exists and then work towards something else.
Joe: You just said 3000 validators means 3000 parallel chains. rob how do you evaluate this goal as an achiever?
Rob: (laughs) It’s not doable yet, that’s for sure. I think I’d be happy with the code running 80-100 parallel chains after one round of optimization, and that’s more than enough for the community.
Jeff: Yeah. Eventually we’ll probably get to a point where we run out of users and we’ll have to start convincing more people to use it, so I guess there will probably be a lot of these bursts.
Rob: I think so, and I think it’s kind of like the challenge that Boca governance has — what’s the long tail effect of the auction program? Because at some point, if all the technology is going well, then we may have the ability to run more parallel chains and maybe even exceed the market demand for parallel chains. But we also don’t want the resources of the parallel chain to be filled up by some junk project that takes up a couple of years, and then of course the community catches up after that (and there will be not enough bits of parallel chain again), and there will definitely be this back and forth development process.
Welcome to learn Substrate:
Follow the progress of Substrate:
Follow the progress of Polkadot:
Posted by:CoinYuppie，Reprinted with attribution to:https://coinyuppie.com/parallel-chain-analysis-implementation-process-technical-difficulties-future-development-direction-etc/
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.