If we were to pave the way for the decentralization of the internet, we would end up focusing on three pillars: consensus, storage, and computation. If humans succeed in decentralizing all three, we will fully realize the next iteration of the Internet: Web3.
(Web3 Basic Section: Consensus, Storage, Computing)
Storage is the second pillar, which is maturing rapidly, with various storage solutions emerging for specific use cases. In this post, we’ll take a closer look at the decentralized storage pillar.
The need for decentralized storage
From a blockchain perspective, we need decentralized storage because blockchains themselves are not designed to store large amounts of data. The mechanism used to achieve blockchain consensus relies on small amounts of data arranged in blocks and quickly shared across the network for verification by nodes.
First, storing data in these blocks is very expensive. Storing the entire image data for BAYC#3368 directly on the Layer 1 network could cost over $18,000 at the time of writing.
(Projects with an active mainnet. The 200-year storage period was chosen to match Arweave’s definition of permanence. Sources: Web Documentation, Arweave Storage Calculator)
Second, if we store large amounts of arbitrary data in these blocks, network congestion will increase severely, causing the price of users to use the network via gas bidding to increase. This is a consequence of the implicit time value of blocks – if users need to submit a transaction to the network at a certain time, they will pay more gas to make their transaction a priority.
Therefore, it is recommended to store the underlying metadata and image data of NFTs, as well as the dApp frontend, off-chain.
An investigation of centralized networks
If storing data on the blockchain is so expensive, why not store it off-chain on a centralized network?
The reason: centralized networks are censorship-prone and mutable. They require users to trust storage providers to keep their data safe. There is no guarantee that the operator of a centralized network will be able to live up to the trust: as data may be deleted intentionally or accidentally, (for example due to a policy change of the storage provider, hardware failure or attack by a third party).
Since some NFT collections have a reserve price of over $100,000, and some collections are worth up to $70,000 per kb of image data, the promise is not enough to ensure that the data is always available. Stronger guarantees are needed to ensure the immutability and permanence of the underlying NFT data.
(The value of precious NFTs, data as of 5.10/2022)
The NFT itself does not actually contain any image data, but rather pointers to metadata and image data stored off-chain.It is this metadata and image data that needs to be protected, as if it were gone, the NFT will just be an empty container.
(blockchain, block, NFT and off-chain metadata)
The value of NFTs is primarily driven not by the metadata and image data they refer to, but by the community around their collectible value and ecosystem. While this may be true, without the underlying data, NFTs are meaningless and communities cannot form.
In addition to profile pictures and art collections, NFTs can also represent ownership of real-world assets (such as real estate or financial instruments). Such data has extrinsic real-world value, and because of what it represents, saving every byte of data that underlies an NFT is at least as valuable as an on-chain NFT.
If NFTs are commodities that exist on the blockchain, then dApps can be thought of as services that exist on the blockchain and facilitate interactions with the blockchain. A dApp is a combination of a front-end user interface that lives off-chain and smart contracts that live on the network and interact with the blockchain. Sometimes they also have a simple backend that can move certain computations off-chain to reduce the gas required and thus reduce the cost to the end user for certain transactions.
(dApp interacts with blockchain)
While the value of a dApp should be considered in the context of the dApp’s purpose (eg, DeFi, GameFi, Social, Metaverse, Name Services, etc.), the amount of value facilitated by a dApp is staggering. The top 10 dApps on DappRadar have collectively facilitated over $150 billion in transfers over the past 30 days.
(Data source: DappRadar, as of 5.11/2022)
Although the core mechanism of the dApp is executed through smart contracts, accessibility to the end user is ensured through its front end. So, in a sense, ensuring the availability of the front end of the dApp is ensuring the availability of the underlying services.
Decentralized storage reduces the likelihood of server failure, DNS hacking, or centralized entities removing access to the dApp frontend. The frontend and access to smart contracts through that frontend can continue to exist even if the development of the dApp ceases.
(Aave founder Stani Kulechov tweeted that the Aave dApp frontend went offline on January 20, 2022, but remains accessible via an IPFS-hosted copy of the website.)
Decentralized storage environment
Blockchains like Bitcoin and Ethereum exist primarily to facilitate the transfer of value. Some networks also take this approach when it comes to decentralized storage networks: they use native blockchains to record and track storage orders, which represent a transfer of value in exchange for storage services. However, this is just one of many potential approaches – the field of storage is vast, and different solutions with different trade-offs and use cases have emerged over the years.
(Part of the protocol involving decentralized storage)
Despite the many differences, all of the above projects have one thing in common: none of these networks replicate all data across all nodes. In a decentralized storage network, the immutability and availability of stored data is not achieved by most networks storing all data and validating continuously linked data. Although as mentioned earlier, many networks choose to use blockchain to track storage orders.
Having all nodes on a decentralized storage network store all data is unsustainable, as the overhead costs of running the network can quickly drive up storage costs for users and ultimately push the network to be more central to the few node operators who can afford the hardware change. Therefore, decentralized storage networks need to overcome very different challenges.
The challenge of data decentralization
Recalling the aforementioned limitations on on-chain data storage, it is clear that a decentralized storage network must store data in a way that does not affect the network’s value transfer mechanism, while ensuring that data remains durable, immutable, and accessible. Essentially, a decentralized storage network must be able to store data, retrieve data, and maintain data while ensuring that all participants in the network are incentivized by the storage and retrieval work they do, while also maintaining the decentralized system de-trust.
These challenges can be summarized as the following questions:
Data storage format: store complete files or file fragments?
Data replication : across how many nodes to store data (full files or fragments)?
Storage Tracking : How does the network know where to retrieve files from?
Proof of data storage : Do nodes store the data they are asked to store?
Data availability over time : Is the data still stored over time?
Storage price discovery : How is storage cost determined?
Persistent data redundancy : How does the network ensure data is still available if a node leaves the network?
Data transfer : Network bandwidth comes at a price – how do you ensure nodes retrieve data when asked?
Network Token Economics : Beyond ensuring data is available on the network, how does the network ensure the longevity of the network?
As part of this research, the various networks that have been explored employ a wide range of mechanisms and achieve decentralization with certain tradeoffs.
(Compare with the technical design of the decentralized storage network)
Data storage format
In these networks, there are two main methods for storing data on the network: storing full files and using erasure coding: Arweave and Crust Network store full files, while Filecoin, Sia, Storj and Swarm all use erasure coding. In erasure coding, data is broken up into constant-size segments, each segment is expanded and encoded with redundant data. The redundant data saved into each segment makes it possible to only need a subset of the segments to reconstruct the original file.
(Data duplication and erasure coding of data)
In Filecoin, Sia, Storj, and Swarm, the network determines the number of erasure coded fragments and the range of redundant data to store in each fragment. However, Filecoin also allows users to determine the replication factor, which determines how many separate physical devices an erasure coded fragment should be replicated as part of a storage transaction with a single storage miner. If a user wants to store files with different storage miners, then the user must make a separate storage transaction. Crust and Arweave let the network decide to replicate, while it is possible to manually set the replication factor on Crust. On Arweave, the proof-of-storage mechanism incentivizes nodes to store as much data as possible. Therefore, Arweave’s replication cap is the total number of storage nodes on the network.
(Data storage format will affect retrieval and reconstruction)
The method used to store and replicate data will affect how the network retrieves data.
After the data is distributed among nodes in the network in whatever form the network stores it, the network needs to be able to keep track of the stored data. Filecoin, Crust, and Sia all use local blockchains to track storage orders, while storage nodes also maintain lists of local network locations. Arweave uses a blockchain-like structure. Unlike blockchains like Bitcoin and Ethereum, on Arweave, nodes can decide for themselves whether to store the data in blocks. So if you compare the chains of multiple nodes on Arweave, they will not be exactly the same – instead, some blocks will be lost on some nodes and found on others.
(Arweave network – illustration of three nodes in blockweave)
In the end, Storj and Swarm use two completely different approaches. In Storj, a second type of node called a satellite node acts as a coordinator for a group of storage nodes that manage and track where data is stored. In Swarm, the address of the data is embedded directly into the data block. When retrieving data, the network knows where to look based on the data itself.
Proof of stored data
Each network takes its own unique approach when proving how data is stored. Filecoin uses Proof of Replication – a proprietary proof-of-storage mechanism that first stores data on storage nodes and then seals the data in a sector. The sealing process makes it possible for two replicated pieces of the same data to prove to be unique to each other, ensuring that the correct number of copies are stored on the network (hence, “proof of replication”).
Crust breaks a piece of data into many small pieces, which are hashed into a Merkle tree. By comparing the hash result of a single piece of data stored on the physical storage device with the expected Merkle tree hash, Crust can verify that the file has been stored correctly. This is similar to Sia’s approach, except that Crust stores the entire file on each node, while Sia stores erasure-coded fragments. Crust can store entire files on a single node and still achieve privacy through the use of a node Trusted Execution Environment (TEE), a sealed hardware component that even the hardware owner cannot access. Crust calls this proof of storage algorithm “meaningful proof of work,” and the meaningful representation computes new hashes only when changes are made to the stored data, reducing meaningless operations. Both Crust and Sia store the Merkle tree root hash on the blockchain as a source of truth for verifying data integrity.
Storj checks that data has been stored correctly through data auditing. Data auditing is similar to how Crust and Sia use Merkle trees to validate pieces of data. On Storj, once enough nodes have returned their audit results, the network can determine which nodes are down based on the majority response, rather than comparing with the blockchain’s source of truth. This mechanism in Storj is intentional, as the developers believe that reducing network-wide coordination through the blockchain can improve both speed (without waiting for consensus) and bandwidth usage (without requiring the entire network to regularly interact with the blockchain).
Arweave uses a cryptographic proof-of-work puzzle to determine if a file has been stored. In this mechanism, in order for nodes to be able to mine the next block, they need to prove that they have access to the previous block and another random block from the network block history. Because in Arweave, the uploaded data is stored directly in the block, proving that access to the previous block proves that the storage provider did save the file correctly.
Finally, Merkle trees are also used on Swarm, with the difference that Merkle trees are not used to determine file locations, but instead data blocks are stored directly in the Merkle tree. When storing data on a swarm, the root hash of the tree (which is also the address where the data is stored) proves that the file was properly chunked and stored.
Data availability over time
Likewise, each network uses a unique method when determining whether data is stored within a specific time period. In Filecoin, in order to reduce network bandwidth, storage miners need to continuously run the proof-of-replication algorithm for the time period in which the data is to be stored. The resulting hash for each time period proves that the storage space has been occupied by the correct data during the specific time period, hence the “Proof of Space-Time”.
Crust, Sia, and Storj periodically verify random pieces of data and report the results to their coordination mechanisms — Crust and Sia’s blockchains, and Storj’s satellite nodes. Arweave ensures consistent availability of data through its Proof of Access mechanism, which requires miners to prove not only that they have access to the last block, but also that they have access to a random historical block. Storing older and rarer blocks is an incentive because it increases the likelihood that miners will win the proof-of-work puzzle, which is a prerequisite for accessing a particular block.
Swarm, on the other hand, regularly runs sweepstakes that reward nodes for holding less popular data over time, while also running proof-of-ownership algorithms for data that nodes commit to storing for longer periods of time.
Filecoin, Sia, and Crust require nodes to deposit collateral to become storage nodes, while Swarm only needs it for long-term storage requests. Storj does not require upfront collateral, but Storj will withhold a portion of miners’ storage revenue. Finally, all networks periodically pay nodes for periods of time that nodes can provably store data.
Storage Price Discovery
To determine storage prices, Filecoin and Sia use storage markets, where storage providers set their asking prices, storage users set prices they are willing to pay, and a few other settings. The storage marketplace then connects users with storage providers that meet their requirements. Storj takes a similar approach, with the main difference being that there is no single network-wide marketplace that connects all nodes on the network. Instead, each satellite has its own set of storage nodes that it interacts with.
Finally, Crust, Arweave, and Swarm all let protocols dictate the price of storage. Crust and Swarm allow certain settings based on the user’s file storage requirements, whereas on Arweave files are always stored permanently.
Persistent Data Redundancy
Over time, nodes will leave these open public networks, and when nodes disappear, so will the data they store. Therefore, the network must actively maintain some degree of redundancy in the system. Sia and Storj do this by collecting a subset of fragments, reconstructing the underlying data, and then re-encoding the file to recreate the missing fragments to complement the missing erasure-coding fragments. In Sia, users must periodically log in to the Sia client to perform fragmentation replenishment, because only the client can distinguish which data fragments belong to which data and users. On Storj, Satellite is responsible for always-on and regularly running data audits to supplement data fragments.
Arweave’s Proof-of-Access algorithm ensures that data is always replicated regularly across the network, whereas on Swarm, data is replicated to nodes that are close to each other. On Filecoin, if data disappears over time and the remaining file fragmentation falls below a certain threshold, a storage order is reintroduced into the storage market, allowing another storage miner to take over that storage order. Finally, Crust’s replenishment mechanism is currently in development.
Incentive data transfer
Over time, after the data is securely stored, users will want to retrieve the data. Since bandwidth comes at a cost, storage nodes must be incentivized to provide data when needed. Crust and Swarm use a debt and credit mechanism, where each node tracks how their inbound and outbound traffic interacts with other nodes they interact with. If a node only accepts inbound traffic, but not outbound traffic, it will be de-prioritized for future communication, which may affect its ability to accept new storage orders. Crust uses the IFPS Bitswap mechanism, while Swarm uses a proprietary protocol called SWAP.On Swarm’s SWAP protocol, the network allows nodes to pay off their debt with stamps (only accepting inbound traffic that doesn’t have enough outbound traffic), which can be exchanged for their utility tokens.
(Group Accounting Protocol (SWAP)
This generous tracking of nodes is also how Arweave ensures that data is transmitted when requested. In Arweave, this mechanism is called wildfire, and nodes will prioritize peers with better rankings and rationalize bandwidth usage accordingly. Finally, on Filecoin, Storj, and Sia, users end up paying for bandwidth, incentivizing nodes to deliver data when requested.
The token economics design ensures the stability of the network, and it also ensures that the network will exist for a long time, as the final data is only as permanent as the network. In the table below, we can find a brief summary of token economics design decisions, as well as the inflation and deflation mechanisms embedded in the corresponding designs.
(Token Economics of Partial Storage Networks)
Which network is the best?
It is impossible to say that one network is objectively better than another. There are countless trade-offs that must be considered when designing a decentralized storage network. While Arweave is great for storing data permanently, Arweave is not necessarily suitable for migrating Web 2.0 industry players to Web 3.0 – not all data needs to be stored permanently. However, there is one strong data subfield that does require permanence: NFTs and dApps.
Ultimately, design decisions are made based on the purpose of the network.
Below is a summary overview of various storage networks, which are compared to each other on a set of scales defined below. The scales used reflect the comparative dimensions of these networks, but it should be noted that approaches to overcoming the challenges of decentralized storage are in many cases not good or bad, but simply reflect design decisions.
Storage parameter flexibility: the degree to which the user controls the file storage parameters
Storage Durability: The extent to which file storage can be theoretically durable over the network (i.e. without intervention)
Redundancy Persistence: The ability of the network to maintain data redundancy through replenishment or repair
Data transfer incentive: the extent to which the network ensures that nodes are generous in transferring data
Pervasiveness of storage tracking: the degree of consensus among nodes on where data is stored
Guaranteed Data Accessibility: The ability of the network to ensure that a single participant in the storage process cannot remove access to files on the network
The higher the score, the stronger the ability of each of the above.
Filecoin’s token economics support increased storage space across the network for storing large amounts of data in an immutable manner. Also, their storage algorithm is more suitable for data that is unlikely to change much over time (cold storage).
(Filecoin summary overview)
Crust’s tokenomics ensures hyper-redundancy and fast retrieval speed, making it suitable for high-traffic dApps and suitable for fast retrieval of data for popular NFTs.
Crust has a lower score for storage durability because without persistent redundancy, its ability to provide permanent storage is severely compromised. Still, durability can be achieved by manually setting extremely high replication factors.
(Crust summary overview)
Sia is all about privacy. The reason why the user is required to restore health manually is because the node does not know which data fragments it stores and which data these fragments belong to. Only the data owner can reconstruct the original data from the shards in the network.
(Sia summary overview)
In contrast, Arweave is all about persistence. This is also reflected in their endowment design, which makes storage more expensive, but also makes them an extremely attractive option for NFT storage.
(Arweave summary overview)
Storj’s business model appears to have largely influenced how they bill and pay: Amazon AWS S3 users are more familiar with monthly billing. By removing the complex payment and incentive systems commonly found in blockchain-based systems, Storj Labs sacrifices some decentralization but significantly lowers the barriers to entry for a key target group of AWS users.
(Storj summary overview)
Swarm’s bonding curve model ensures that storage costs remain relatively low overtime as more data is stored on the network, and its proximity to the Ethereum blockchain makes it a powerful source of primary storage for more complex Ethereum-based dApps competitor.
(Swarm summary overview)
There is no single best approach to the various challenges faced by decentralized storage networks. Depending on the purpose of the network and the problem it is trying to solve, it must have trade-offs between the technical and token economics of network design. The purpose of the network and the specific use case it is trying to optimize for will dictate various design decisions.
Going back to the Web3 infrastructure pillars (consensus, storage, computation), we see that the decentralized storage space has a small number of strong players who have positioned themselves in the market for specific use cases. This doesn’t preclude new networks from optimizing existing solutions or capturing new niches, but it does raise the question: what’s next?
The answer is: calculation. The next frontier for a truly decentralized internet is decentralized computing. Currently, there are only a handful of solutions that can bring trustless, decentralized computing solutions to market that power complex dApps at a fraction of the cost of executing smart contracts on the blockchain. Costs are more complex to calculate.
The Internet Computer (ICP) and Holochain (HOLO) are the networks that have a strong position in the decentralized computing market at the time of writing. Still, the computing space is not as crowded as the consensus and storage space.Therefore, sooner or later strong competitors will enter the market and position themselves accordingly. One such competitor is Stratos (STOS). Stratos offers a unique network design through its decentralized data grid technology. We see decentralized computing, especially the network design of the Stratos network, as an area of future research.
Posted by:CoinYuppie，Reprinted with attribution to:https://coinyuppie.com/web3-pillar-comprehensive-analysis-of-decentralized-storage/
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.