The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Filecoin, Arweave, Storj, Crust Network, Sia and Swarm, which is the best decentralized storage solution?

If we want to go further in the decentralized internet, we will ultimately need these three pillars: consensus, storage, and computation. If humanity succeeds in decentralizing these three domains, we will embark on the next stage of the Internet’s journey: Web3.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 1: Examples of projects for each Web3 pillar

Storage, as the second pillar, is maturing rapidly, and various storage solutions have been applied to usage scenarios. In this article, the pillar of decentralized storage will be explored further.

This article is a summary of the full length, which can be downloaded from the decentralized storageArweave and Crust Network .

The need for decentralized storage

Blockchain Perspective

From a blockchain perspective, we need decentralized storage because the blockchain itself is not designed to store large amounts of data. The mechanism for obtaining block consensus relies on small amounts of data (transactions) that are placed in blocks (collecting transactions) and quickly shared to the network for nodes to verify.

First, storing data in blocks is very expensive. At the time of writing, it would cost over $18,000 to store a full BAYC #3368 on layer1.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 2: A project with an active mainnet. A storage period of 200 years was chosen to meet Arweave’s definition of permanence. Source: Web Documentation, Arweave Storage Calculator

Second, if we wanted to store a lot of arbitrage data in these blocks, network congestion would become severe, which would cause gas wars when using the network and thus lead to price increases. This is a consequence of the implicit time value of blocks, if users need to submit a transaction to the network at a certain time, they will need to pay extra gas fees to get their transaction processed first.

Therefore, it is recommended to store NFT metadata as well as image data and the front end of dApps off-chain.

The Perspective of Centralized Networks

If storing data on-chain is so expensive, why not store data directly off-chain on a centralized network?

Centralized networks are vulnerable to censorship and have variability. This requires users to trust the data provider to maintain the security of the data. No one can be sure that the operators of a centralized network will truly live up to the trust that users place in them: data can be erased intentionally or accidentally. For example, it may be due to policy changes by data providers, hardware failures, or attacks by third parties.

NFTs

With the floor price of NFT collections exceeding $100,000, and some NFTs worth as much as $70,000 per kb of image data, commitment alone is not enough to ensure moment-by-moment availability of data. Stronger guarantees are required to ensure the immutability and persistence of the underlying NFT data.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 3: Crypto Punk floor price based on last sale (no floor price at the time of writing); Crypto Punk image size is based on the byte length of the Crypto Punks V2 on-chain byte string. Data as of May 10, 2022. Source: OpenSea, On-Chain Data, IPFS Metadata

NFTs don’t really contain any image data, instead, they just have pointers to metadata and image data stored off-chain. But it is these metadata and image data that need to be protected, and if this data disappears, the NFT will be just an empty container.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 4: Simplified illustration of blockchain, blocks, NFTs, and off-chain metadata

Arguably, the value of NFTs is not primarily driven by the metadata and imagery data they refer to, but rather by the community of movements and ecosystems driven around the collection. While this may be true, without the underlying data, NFTs are meaningless and meaningless communities simply cannot form.

In addition to profile pictures and art collections, NFTs can also represent ownership of real-world assets, such as real estate or financial instruments. In addition to the external real-world value of such data, since its value is represented by NFT, the value of each byte of data stored in NFT will not be lower than the value of NFT on the chain.

dApps

If NFTs are commodities that live on the blockchain, then dApps can be thought of as services that live on the blockchain and facilitate interactions with the blockchain. A dApp is a combination of a front-end user interface that lives off-chain and a smart contract that lives on the network and interacts with the blockchain.Sometimes they also have a simple backend that can move certain computations off-chain to reduce the gas required and thus reduce the cost to the end user for certain transactions.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 5: Simplified illustration of a dApp interacting with the blockchain

While the value of a dApp should be considered in the context of the dApp (eg, DeFi, GameFi, Social, Metaverse, Name Services, etc.), the value that dApps bring is staggering. The top 10 dApps on DappRadar collectively facilitated over $150 billion in transfers in the past 30 days at the time of writing.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 6: Most popular dApps by dollar volume as reported by DappRadar as of May 11, 2022

Although the core mechanism of the dApp is executed by smart contracts, end users can ensure user accessibility through the front end. So, in a sense, ensuring the accessibility of the dApp frontend is about ensuring the availability of the underlying services.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 7: Aave founder Stani Kulechov tweeted that the Aave dApp frontend went offline on January 20, 2022, but is still accessible via an IPFS-hosted copy of the website

Decentralized storage reduces server failures, DNS hacks, and centralized entities removing access to dApp frontends. Even if the development of the dApp is stopped, the smart contract can continue to be accessed through the front end.

Decentralized storage landscape

Blockchains like Bitcoin Ethereum exist primarily to facilitate the transfer of value. Some networks also take this approach when it comes to decentralized storage networks: they use native blockchains to record and track storage orders, which represent a transfer of value in exchange for storage services. However, this is just one of many potential approaches – the storage landscape is vast, and different solutions with different trade-offs and use cases have emerged over the years.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 8: Overview of some arbitrarily chosen decentralized storage protocols (non-exhaustive)

Despite the many differences, all of the above projects have one thing in common: none of these networks replicate all data on all nodes, as is the case with the Bitcoin and Ethereum blockchains. In a decentralized storage network, the immutability and availability of stored data is not achieved by most networks storing and verifying successively linked data, as is the case with Bitcoin and Ethereum. Although as mentioned earlier, many networks choose to use blockchain to track storage orders.

It is unsustainable for all nodes on a decentralized storage network to store all data, as the overhead costs of running the network can quickly increase storage costs for users and ultimately drive the centralization of the network to a small number of people who can afford the hardware Node operator.

Therefore, decentralized storage networks need to overcome extraordinary challenges.

Challenges of Decentralized Storage

Recalling the aforementioned limitations on on-chain data storage, it is clear that a decentralized storage network must store data in a way that does not affect the network’s value transfer mechanism, while ensuring that data remains durable, immutable, and accessible. Essentially, a decentralized storage network must be able to store data, retrieve data, and maintain data while ensuring that all participants in the network are incentivized by the storage and retrieval work they do, while also maintaining the decentralized system of trustlessness.

These challenges can be summarized as the following questions:

  • Data storage format: store complete files or file fragments?
  • Data replication: across how many nodes to store data (full files or fragments)?
  • Storage Tracking: How does the network know where to retrieve files from?
  • Proof of stored data: Do nodes store the data they are asked to store?
  • Data availability over time: Is the data still stored over time?
  • Storage price discovery: How is storage cost determined?
  • Persistent data redundancy: How does the network ensure data is still available if a node leaves the network?
  • Data transfer: Network bandwidth comes at a price – how do you ensure nodes retrieve data when asked?
  • Network Token Economics: Beyond ensuring data is available on the network, how does the network ensure the longevity of the network?

As part of this research, the various networks that have been explored employ a wide range of mechanisms and achieve decentralization with certain tradeoffs.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 9: Summary of technical design decisions for audited storage networks

For an in-depth comparison of the above networks for each challenge, as well as detailed profiles of each network, read the full research article on Arweave or Crust Network .

Data storage format

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 10: Data replication and erasure coding

In these networks, there are two main methods for storing data on the network: storing full files and using erasure coding: Arweave and Crust Network store full files, while Filecoin, Sia, Storj and Swarm all use erasure coding . In erasure coding, data is broken into fixed-size segments, each segment is expanded and encoded with redundant data. The redundant data saved into each segment makes it possible to only need a subset of the segments to reconstruct the original file.

data replication

In Filecoin, Sia, Storj, and Swarm, the network determines the number of erasure coded fragments and the range of redundant data to store in each fragment. However, Filecoin also allows users to determine the replication factor, which determines how many separate physical devices an erasure coded fragment should be replicated as part of a storage transaction with a single storage miner. If a user wants to store files with different storage miners, then the user must make a separate storage transaction. Crust and Arweave let the network decide on replication, while it is possible to manually set the replication factor on Crust. On Arweave, the proof-of-storage mechanism incentivizes nodes to store as much data as possible. Therefore, Arweave’s replication cap is the total number of storage nodes on the network.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 11: Data storage format will affect retrieval and reconstruction

The method used to store and replicate data will affect how the network retrieves data.

storage tracking

After the data is distributed among nodes in the network in whatever form the network stores it, the network needs to be able to keep track of the stored data. Filecoin, Crust, and Sia all use local blockchains to track storage orders, while storage nodes also maintain lists of local network locations. Arweave uses a blockchain-like structure. Unlike blockchains like Bitcoin and Ethereum, on Arweave, nodes can decide for themselves whether to store data from blocks. So if you compare the chains of multiple nodes on Arweave, they won’t be exactly the same – instead, some blocks will be lost on some nodes and found on others.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 12: Illustration of three nodes in blockweave

In the end, Storj and Swarm use two completely different approaches. In Storj, a second type of node called a satellite node acts as a coordinator for a group of storage nodes, managing and tracking where data is stored. In Swarm, the address of the data is embedded directly into the data block. When retrieving data, the network knows where to look based on the data itself.

Store data proof

Each network takes its own unique approach when proving how data is stored. Filecoin uses Proof of Replication — a proprietary Proof of Storage mechanism that first stores data on storage nodes and then seals the data in a sector. The sealing process makes it possible for two replicated pieces of the same data to prove to be unique to each other, ensuring that the correct number of copies are stored on the network (hence “proof of replication”).

Crust breaks up a piece of data into many small pieces, which are hashed into a Merkle tree. By comparing the hash result of a single piece of data stored on the physical storage device to the expected Merkle tree hash value, Crust can verify that the file has been stored correctly. This is similar to Sia’s approach, except that Crust stores the entire file on each node, while Sia stores erasure-coded fragments. Crust can store entire files on a single node and still achieve privacy through the use of a node Trusted Execution Environment (TEE), a sealed hardware component that even the hardware owner cannot access. Crust calls this proof-of-storage algorithm “meaningful proof-of-work,” and meaningful means that new hashes are calculated only when changes are made to the stored data, reducing meaningless operations. Both Crust and Sia store the Merkle root hash on the blockchain as a source of truth for verifying data integrity.

Storj uses data auditing to check that data has been stored correctly. Data auditing is similar to how Crust and Sia use Merkle trees to validate pieces of data. On Storj, once enough nodes have returned their audit results, the network can determine which nodes are down based on the majority response, rather than comparing with the blockchain’s source of truth. This mechanism in Storj is intentional because the developers believe that reducing network-wide coordination through the blockchain can improve performance in terms of speed (without waiting for consensus) and bandwidth usage (without the entire network regularly interacting with the blockchain).

Arweave uses a cryptographic proof-of-work puzzle to determine if a file has been stored. In this mechanism, in order for nodes to be able to mine the next block, they need to prove that they have access to the previous block and another random block in the network’s block history. Because data uploaded in Arweave is stored directly in the block, access to the previous block proves that the storage provider did save the file correctly.

Finally, a Merkle tree is also used on Swarm, the difference is that the Merkle tree is not used to determine file locations, instead the data blocks are stored directly in the Merkle tree. When storing data on a swarm, the root hash of the tree (which is also the address where the data is stored) proves that the file was properly chunked and stored.

Data availability over time

Likewise, each network has a unique approach to determining that data is stored within a specific time period. In Filecoin, in order to reduce network bandwidth, storage miners need to continuously run the proof-of-replication algorithm for the time period in which the data is to be stored. The resulting hash for each time period proves that the storage space has been occupied by the correct data for a specific time period, hence the “Proof of Space-Time”.

Crust, Sia, and Storj periodically verify random pieces of data and report the results to their coordination mechanisms — Crust and Sia’s blockchains, and Storj’s satellite nodes. Arweave ensures consistent availability of data through its Proof of Access mechanism, which requires miners to prove not only that they have access to the last block, but also that they have access to a random historical block. Storing older and rarer blocks is an incentive because it increases the likelihood that miners will win the proof-of-work puzzle, which is a prerequisite for accessing a particular block.

Swarm, on the other hand, periodically runs sweepstakes that reward nodes for holding less popular data over time, while also running proof-of-ownership algorithms for data that nodes promise to store for a longer period of time.

Filecoin, Sia, and Crust require nodes to deposit collateral to become storage nodes, while Swarm only needs it for long-term storage requests. Storj does not require upfront collateral, but Storj will withhold a portion of miners’ storage revenue. Finally, all networks periodically pay nodes for periods of time that nodes can provably store data.

Storage Price Discovery

To determine storage prices, Filecoin and Sia use storage markets, where storage providers set their asking prices, storage users set prices they are willing to pay, and a few other settings. The storage marketplace then connects users with storage providers that meet their requirements. Storj takes a similar approach, with the main difference being that there is no single network-wide marketplace that connects all nodes on the network. Instead, each satellite has its own set of storage nodes that it interacts with.

Finally, Crust, Arweave, and Swarm all let the protocol dictate storage prices. While Crust and Swarm can have certain settings based on the user’s file storage requirements, files on Arweave are stored permanently.

Persistent Data Redundancy

Over time, nodes will leave these open public networks, and when nodes disappear, so will the data they store. Therefore, the network must actively maintain some degree of redundancy in the system. Sia and Storj recreate the missing fragments by collecting a subset of fragments, reconstructing the underlying data, and then re-encoding the files, achieving redundancy by complementing the missing erasure coded fragments. In Sia, users must periodically log in to the Sia client to replenish shards, because only the client can distinguish which data shards belong to which pieces of data and users. On Storj, Satellite is always online and runs data audits periodically to supplement data fragments.

Arweave’s Proof-of-Access algorithm ensures that data is always replicated regularly across the network, whereas on Swarm, data is replicated to nodes that are close to each other. On Filecoin, if data disappears over time and the remaining file fragmentation falls below a certain threshold, a storage order is reintroduced into the storage market, allowing another storage miner to take over that storage order. Crust’s replenishment mechanism is currently under development.

Incentive data transfer

Over time, after the data is securely stored, users will want to retrieve the data. Since bandwidth comes at a cost, storage nodes must be incentivized by providing data when needed. Crust and Swarm use a debt and credit mechanism, where each node tracks how inbound and outbound traffic flows with the nodes they interact with. If a node only accepts inbound traffic, but not outbound traffic, it is de-prioritized for future communication, which may affect its ability to accept new storage orders. Crust uses the IFPS Bitswap mechanism, while Swarm uses a proprietary protocol called SWAP. On Swarm’s SWAP protocol, the network allows nodes to pay off their debt with stamps (only accepting inbound traffic that doesn’t have enough outbound traffic), which can be exchanged for their utility tokens.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 13: Group Accounting Protocol (SWAP), Source: Swarm White Paper

This tracking of node generosity is also how Arweave ensures data is delivered when requested. In Arweave, this mechanism is called wildfire, and nodes will prioritize better-ranked peers and rationalize bandwidth usage accordingly. Finally, on Filecoin, Storj, and Sia, users end up paying for bandwidth, incentivizing nodes to deliver data when requested.

token economy

The token economic design ensures the stability of the network and also ensures that the network will exist for a long time, as the final data is only as permanent as the network. In the table below, we can find a brief summary of token economics design decisions, as well as the inflation and deflation mechanisms embedded in the corresponding designs.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 14: Token economic design decisions for audited storage networks.

Which is the best network?

It cannot be said that one network is objectively better than another. There are countless trade-offs when designing a decentralized storage network. While Arweave is great for storing data permanently, Arweave is not necessarily suitable for migrating Web 2.0 industry players to Web 3.0 – not all data needs to be stored permanently. However, one strong data subfield does require permanence: NFTs and dApps.

Ultimately, design decisions are based on the purpose of the network.

Below is a summary overview of various storage networks, which are compared to each other on a set of scales defined below. The scales used reflect the comparative dimensions of these networks, but it should be noted that approaches to overcoming the challenges of decentralized storage are in many cases not good or bad, but simply reflect design decisions.

  • Storage parameter flexibility: the degree to which the user controls the file storage parameters
  • Storage Durability: The extent to which file storage can be theoretically durable over the network (i.e. without intervention)
  • Redundancy Persistence: The ability of the network to maintain data redundancy through replenishment or repair
  • Data transfer incentive: the extent to which the network ensures that nodes are generous in transferring data
  • Pervasiveness of storage tracking: the degree of consensus among nodes on where data is stored
  • Guaranteed Data Accessibility: The ability of the network to ensure that a single participant in the storage process cannot remove access to files on the network

The higher the score, the stronger the ability of each of the above.

Filecoin’s token economics support increased storage space across the network for storing large amounts of data in an immutable manner. Also, their storage algorithm is more suitable for data that is unlikely to change much over time (cold storage).

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 15: Filecoin Summary Overview

Crust’s token economics ensure hyper-redundancy and fast retrieval, making it suitable for high-traffic dApps and for fast retrieval of data for popular NFTs.

Crust has a low score for storage durability because without persistent redundancy, its ability to provide permanent storage is severely compromised. Still, durability can be achieved by manually setting extremely high replication factors.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 16: Crust summary overview

Sia is all about privacy. The reason why the user is required to restore health manually is because the node does not know which data fragments it stores and which data these fragments belong to. Only the data owner can reconstruct the original data from the shards in the network.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 17: Sia Summary Overview

In contrast, Arweave is all about persistence. This is also reflected in their endowment design, which makes storage more expensive, but also makes them an extremely attractive option for NFT storage.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 18: Summary overview of Arweave

Storj’s business model appears to have largely influenced how they bill and pay: Amazon AWS S3 users are more familiar with monthly billing. By removing the complex payment and incentive systems commonly found in blockchain-based systems, Storj Labs sacrifices some decentralization but significantly lowers the barriers to entry for a key target group of AWS users.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 19: Storj summary overview

Swarm’s bonding curve model ensures that storage costs remain relatively low as more data is stored on the network, and its proximity to the Ethereum blockchain makes it a strong competition for primary storage for more complex Ethereum-based dApps By.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 20: Swarm summary overview

There is no single best approach to the various challenges faced by decentralized storage networks.Depending on the purpose of the network and the problem it is trying to solve, it must have trade-offs between the technical and token economics of network design.

The Pillars of Web3: An Overview of the Decentralized Storage Ecosystem

Figure 21: Summary of robust use cases for reviewed storage networks

In the end, the purpose of the network and the specific use case it is trying to optimize for will dictate various design decisions.

next chapter

Going back to the Web3 infrastructure pillars (consensus, storage, computation), we see that the decentralized storage space has a small number of strong players who have positioned themselves in the market for specific use cases. This doesn’t preclude new networks from optimizing existing solutions or capturing new niches, but it does raise the question: what’s next?

The answer is: calculation. The next frontier for a truly decentralized internet is decentralized computing.Currently, there are only a handful of solutions that bring trustless, decentralized computing solutions to market that can power complex dApps at a fraction of the cost of executing smart contracts on the blockchain. Costs are more complex to calculate.

The Internet Computer (ICP) and Holochain (HOLO) are the networks that have a strong position in the decentralized computing market at the time of writing. Still, the computing space is not as crowded as the consensus and storage space. Therefore, sooner or later strong competitors will enter the market and position themselves accordingly. Stratos (STOS) is one such competitor. Stratos offers a unique network design through its decentralized data grid technology.

We see decentralized computing, especially the network design of the Stratos network, as an area of ​​future research.

Posted by:CoinYuppie,Reprinted with attribution to:https://coinyuppie.com/the-pillars-of-web3-an-overview-of-the-decentralized-storage-ecosystem/
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.

Like (0)
Donate Buy me a coffee Buy me a coffee
Previous 2022-06-23 10:33
Next 2022-06-23 10:34

Related articles