Network Stability Report: Kusama and Parachain

Network Stability Report: Kusama and Parachain

After the first 5 parachain auctions, we monitored the stability of the Kusama network. At this time, there are 6 parallel chains on the network.

Our focus is mainly on 4 key areas:

  • Candidate project stability
  • Approval vote statistics
  • Internet connection
  • Load condition

We sampled metrics from opt-in validators to collect information on Prometheus and Grafana. Please see below for details.

Candidate project stability

In an ideal situation, each parachain will generate a block for every 2 relay chains. We can determine the block generation rate of each parachain by dividing the number of relay chain blocks generated during the online period of the parachain by the number of parachain blocks generated during the time period.

The following table shows the current 6 parachain values ​​of the latest block number (#) so far.

Network Stability Report: Kusama and Parachain

Since all these parachains are protected by the same set of validators and verified by random validators, there should be no major differences in the services provided by the validators to the parachains.

The impact of network noise does not need to be considered at this time, because in the past few days, parachains did not have enough time to be repeatedly exposed to every possible combination of supporting validators. But the noise cannot explain the huge difference between Shiden and other parachains, which mainly occupy the range between 5% and 10% of the ideal value. It is worth noting that Statemine experienced a period of instability in the first few weeks of its launch, which caused it to generate only one block per minute, and the current data was also biased due to the initial instability.

There are two possible explanations for this difference. The real reasons may include the following two:

  • A large number of parallel chain execution programs or data
  • Poor connectivity between collectors and verifiers

Currently, the time window for generating parachain blocks for collectors and verifiers is very short, which makes the system very fragile and will experience short delays in communication. For these two problems, the long-term solution is to improve the parachain protocol to allow longer time for the creation of the next parachain block. The short-term solution is to locate the collector geographically closer to most of the validator nodes. However, this creates a temporary regional concentration risk-a long-term solution can mitigate this risk.

Approval vote  

The approval voting protocol is responsible for providing most of the security of the parachain. It is tightly integrated with the final GRANDPA agreement. In general, nodes are randomly selected to check the validity of parachain blocks. This process requires a certain number of nodes to complete the relay chain block containing candidates. Controversy over validity will escalate to the entire validator set, which will eventually result in at least one validator being cut.

To benchmark the approval vote, we can observe the following points:

  • Validator GRANDPA final lagging opinions
  • Average “portion” assigned by validators (ideally = 0)
  • Number assigned and approved by validators

Final delay

Network Stability Report: Kusama and Parachain

The above figure shows the logarithmic ratio, the maximum and average block numbers should lag behind the final block number of the relay chain. Each verifier has its own point of view, based on the verifier’s perception of the approval status of each parachain block referenced by the relay chain.

In most cases, the data will be between 2 and 5. But sometimes it jumps to 50. Up to 50 blocks have a fault protection, the actual situation is that it will be hit once every few weeks.

We will propose governance solutions aimed at solving these problems before the Polkadot parachain is released.

Average part

Network Stability Report: Kusama and Parachain

Each validator is technically assigned to check each parachain block. Usually only the 0th batch of verifiers are actually called for inspection, and subsequent batches will only appear when the 0th batch of verifiers fails to appear.

The figure above shows that, except for the final stall event, the 50th and 95th percentiles are usually assigned 0.

Assign and approve

Network Stability Report: Kusama and Parachain

This figure shows how the distribution of validators on the network is transformed into corresponding approval votes. This data is inconsistent with the final delay of the report, because “outdated” approvals are those that become irrelevant after finalization.

Most approvals should be timely, as they are necessary for the final determination of the results. This category may be falsely reported by the node or Grafana. At Rococo, the corresponding chart shows a nearly 1:1 mapping of allocations and successful approvals.

Internet connection

Network Stability Report: Kusama and Parachain

On Kusama , there are 900 validators, and 200 will be randomly selected to participate in the parachain consensus in each session. Each current validator is designed to connect to the current validator set and the last 6 sessions.

Many validators have about 200 connections because they are part of the old validator set. Validators that are part of the current validator set should experience higher connectivity peaks. We can see that, to a large extent, the validators we checked in the network were over-connected and connected to most of the other 899 validators.

Some authenticators are not connected enough and are not connected to the network as they should. Nevertheless, no verifier has less than 100 connections, so more information should be shared with the verifier.

Certain requests require peer-to-peer communication, so all validators must be publicly accessible through published node addresses. The node will automatically perform this function, and the node operator is responsible for ensuring that the node is reachable.

Network Stability Report: Kusama and Parachain

The graph shows the number of block requests issued per second and the number of different types of failures. The type of request here is not important, the key is that “dial failed” (yellow line in the figure below) is almost exactly 10% of the number of requests. This means that 10% of verifiers are not accessible on their published addresses.

Load (CPU and network)

Network Stability Report: Kusama and Parachain

This graph shows the CPU usage of the validator in the kernel. Most validators are in the 1.5-2 core utilization range. Our current recommendation is to let the validator run on a 4-core CPU, so the CPU utilization is within the expected range.

Network Stability Report: Kusama and Parachain

This graph shows the breakdown of CPU usage by task. The first 3 tasks dominate the CPU usage, in order are “libp2p-node”, “network-worker” and “grandpa-voter”. These tasks are mainly related to the network, which indicates that the optimization of the network utilization will greatly reduce the CPU utilization of the node.

Network Stability Report: Kusama and Parachain

Most of the traffic used by the node occurs on the /polkadot/validation/1 network protocol. This aggregates all the information between nodes and accounts for a large part of the network traffic. The graph shows that, overall, the average network speed of the validator is stable between 400-500KB/s.

Network Stability Report: Kusama and Parachain

Most of the requests used by nodes are in the block distribution protocol. With 200 validators and a maximum PoV of 1MB, the peak value of the block is about 15KB. At these average request/response rates, this means approximately 307KB/s input and 138KB/s output speed. However, the PoV is currently very small because the parachain has not yet approached the peak transaction volume.

Suggest

In general, the network is running smoothly. Although the average number of peering points and network speed seem to be nothing wrong in the entire network, there are still some abnormal nodes that are over-connected and need to bear a higher level of load.

In the current environment, with the express connection of the Internet, a powerful 4-core CPU and 64GB of memory are sufficient. Current network speeds are approximately in the range of 8-16Mbps, so a typical 100Mbps data center connection is sufficient to sustain the final 5 sessions.

The only problem is the pause in the network. These pauses were caught by the fail-safe device, so they did not cause much damage. The relevant staff are investigating the cause and will propose a solution before the parachain is launched on Polkadot .

 

Posted by:CoinYuppie,Reprinted with attribution to:https://coinyuppie.com/network-stability-report-kusama-and-parachain/
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.

Like (0)
Donate Buy me a coffee Buy me a coffee
Previous 2021-08-18 12:22
Next 2021-08-18 12:23

Related articles