Data Determines Humans: Exploring Decentralized Cloud Storage

The article is reproduced with the permission of the author. The author, Hunter Lampson, is an analyst at Goldman Sachs who is interested in capital deployment and digital assets. He will appear in the creator community over the weekend. You can find him on Twitter.

Data defines human beings. Society’s pursuit of technological innovation and the digitization of human life has created an explosive demand for data storage and retrieval. From agricultural revolutions, healthcare discoveries and political archives, to self-driving cars, protein folding and neural networks, data is the primary enabler of helping us discover new solutions to our goals. It is the fundamental tool that limits and forces our ability to act with irreducible inputs of agency, allowing access and giving meaning to our digital and physical lives. Data defines human nature: We must be very concerned about how our data is stored, managed and owned.

Global Data Market

Today, more than 63% of the global population, or more than 5 billion people (7.7 billion people according to Google), use the Internet, and this number will continue to grow at a rate of more than 10% every year. But the cloud storage market is growing even faster, with the global data space (the amount of data created, captured, replicated and consumed globally) projected to grow at a CAGR of 58% from 2015 to 2025 , the amount of data created, stored and replicated will exceed 180ZB (1 zb equals 1024tb, 1tb equals 1024GB). If you stack enough 10TB hard drives to meet the world’s data needs by 2025, that stack could reach the moon.

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 1. Global datasphere size by year. Source: Uygun and Döngül, 2021

From an economic perspective, the cloud storage market will be valued at around $76 billion in 2021; by 2028, it will reach $390 billion (26.2% CAGR). Despite such explosive economic growth, cloud storage vendors continue to consolidate their market share. As of Q22, the three big cloud providers — Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (which I affectionately call the Big Three) — held 65% of the cloud computing market . Centralized cloud storage providers have the power to synthesize their network effects, reputation, technology infrastructure and balance sheets that new competitors simply cannot compete with them.

Types of storage solutions

1. Local

2. Centralized (Centralized) Cloud Storage (CCS)

3. Decentralized (Decentralized) Cloud Storage (DCS)

Local storage and CCS providers – the Big Three (Amazon, Azure, Google) as well as Alibaba Cloud, Box, iCloud, etc. – all feature a centralized storage approach. This means that information is stored and maintained in a single location (or a handful of locations), managed in a single database, and operated by a single entity, with both in-place deployments and CCS solutions presenting the risk of a single point of failure.

The popularity of CCS solutions requires a historical review of the economics of on-site data storage. At first, users stored data on their own hardware. This means that both data storage and maintenance want to be stored in the same physical physical location (such as a company’s existing data servers), which I call Phase 1.

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 2. Three stages of data storage adoption. Source: Hunter Lampson

As the network effects of cloud storage enable cheaper (and often more secure) storage capabilities, consumers and companies move to the centralized cloud (Phase 2), CCS Solutions develops cloud computing, APIs and other SaaS offerings, Customers also grew. Although centralized solutions are the easiest, cheapest, and most efficient options on the market, their fundamental limitations remain the same: one container is responsible for 100% of an entity’s data. CCS solutions are an improvement over indoor solutions, but what was once an economical best solution has become expensive and prohibitive. Today, DCS providers are the cheapest and safest storage solutions on the market.

Major weaknesses of CCS solutions

1. Lack of data ownership

When users upload data to the CCS provider, they no longer own their data. Apple’s controversial decision (later reversed) to scan iCloud users’ photos is a case in point. Apple has strict privacy policies when data is stored on a given hardware product (iPhone, Mac, etc.). But importantly, once a user uploads a byte of data to iCloud, Apple considers that data their domain — not the user’s. This precedent means that data stored locally belongs to the user, while data stored in the cloud belongs to the storage provider.

2. Prone to data breaches and outages

It doesn’t take much time for a massive data breach among CCS providers. Amazon, Azure and Google have all suffered from this problem due to their single point of failure structure.

The centralized structure of these providers allows them to construct large walled gardens and provide a higher level of security relative to in-house solutions. At the same time, the larger and more centralized the database becomes, the more attackers will covet it. Data outages are also common in CCS solutions, examples can be seen here: Amazon, Azure, Google.

3. Tends to censorship

Not only do CCS providers lose data uncontrollably, but they also delete data intentionally. Just a few weeks ago, the popular YouTube channel Bankless was terminated without warning, notice or reason. Google owns and stores YouTube content on its cloud service, and thankfully it restored the channel, but Google and other CCS providers had to delete the existence of certain data, which is socially harmful.

4. High cost

Perhaps the most critical disadvantage of CCS solutions is the high cost. While the cost of storing data has dropped by an average of 30.5% per year over the past 50 years, CCS prices have remained the same for the past seven years. This is due to network effects accumulated by CCS providers. Because of these network effects, the Big Three have come to dominate the cloud computing space. As their shared market share continues to grow, the Big Three acts like an oligopoly with the ability to manipulate prices and keep new entrants out.

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 3. Data storage costs over time, Source: Arweave Yellow Book

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 4. Data storage costs over time on AWS, Azure, Google. Source: AWS, Azure, Google, Hunter Lampson.

The main reason for the gap between storage price and storage cost is the market dominance that CCS providers currently maintain, and DCS solutions have taken a different path.

DCS solutions

On top of CCS’s weaknesses, decentralized storage (DCS) has proven to be a paradigm shift in data storage (Phase 3). The DCS solution utilizes free hard disk space across geographically distributed node sets by matching the supply and demand of storage space. This creates a more efficient marketplace, reduces costs, and eliminates the risk of a single point of failure that exists in on-premises and CCS solutions, which also return data ownership to the user.

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 5. Cumulative cost of storing 1 GB per year by platform. Sources: AWS, Azure, Google, Storj, SiaStats, Arweave Fees,, Hunter Lampson.

While the geographic distribution of data centers and storage nodes is not the only factor that determines network concentration, it is a useful touchstone. Node distribution across space is also an important factor in determining the level of replication, retrieval, and protection of data. Generally speaking, the more nodes in the network, the faster the retrieval speed and the stronger the protection against natural disasters (when will we put storage nodes on the moon?!) Therefore, understanding node decentralization is effective The prerequisites for cloud storage are important.

What makes the DCS solution revolutionary compared to the CCS solution is its degree of decentralization. There are more than 114 times more active nodes running on Sia, Storj, Filecoin and Arweave than data centers managed by AWS, Azure and Google Cloud combined.

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 6. Total active nodes by service. Source: Filscan, Viewblock, Storj, SiaStats, Peterson 2015, Baxtel, Google, Sam Williams, Hunter Lampson.

Arweave’s node count is difficult to quantify because the statistics provided by Viewblock treat each storage pool as a single storage node. In an offline conversation, Arweave founder Sam Williams told me that the 59 current storage pools (according to Viewblock) can have hundreds or even thousands of nodes backing them. So Viewblock underestimates the actual number of nodes by about 10-100 times. For this reason, to be conservative, I use “500+” as the number of nodes. It’s also important to note that active node counts are an imperfect measure of decentralization, and absolute node counts don’t tell us who is running a node (and how many nodes each entity operates).

To borrow from Spencer Applebaum and Tushar Jain, an important distinction between DCS services is the distinction between contract-based storage solutions and permanent storage solutions. Simply put: all DCS services currently on the market are contract-based models, with the exception of Arweave.

Contract-based storage model vs permanent storage model

Filecoin, Sia, and Storj use a contract-based pricing model — the same model currently employed by CCS. Contract-based pricing means that users continue to pay for storing data, similar to how a paid subscription (monthly/yearly) is used. Despite the nuances, Filecoin, Sia, and Storj compete directly with existing CCS providers.

Arweave , on the other hand, offers a permanent storage model, which means users pay only one fee and in return their data is stored permanently. Arweave is often lazy and imprecise compared to other DCS and CCS providers, and the fundamental feature that sets Arweave apart from its competitors is data persistence.

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 7. Conceptual diagram of CCS and DCS solutions, source: Hunter Lampson

A closer look at Filecoin, Sia, and Storj helps us better understand how they differ from CCS providers and Arweave.

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 8. Key features of the DCS solution. Source: Filecoin, Storj, Sia, Arweave, CoinMarketCap, Crunchbase.


Filecoin launched its mainnet in October 2020 and is currently the most widely adopted and well-funded DCS project on the market. As of July 12, 2022, Filecoin’s fully diluted market capitalization is approximately $1.19 billion, an all-time high of $12.3 billion. Juan Benet is the founder and CEO of Protocol Labs, which developed Filecoin and its underlying technology, the Interplanetary File System (IPFS). Filecoin has raised $258.2 million in funding so far, most of which came from an initial coin offering (ICO) in late 2017.

To understand Filecoin, we must understand IPFS, a peer-to-peer (P2P) distributed system for storing and retrieving data. Built to address the shortcomings of the http-based internet, IPFS uses content addressing to classify data, meaning that information is requested and delivered based on its content rather than its location. This is achieved by issuing a content identifier (CID) for each chunk of data, which is generated by hashing the content of each file, making it immutable. To locate the requested information (represented by a unique CID), IPFS uses a distributed hash table (dht), which contains the network location of the node that stores the content associated with the CID. When a user requests information from an IPFS node, the node will check its own hash table to see if the requested file can be located (and then retrieved). If the node does not contain the requested information, it can download the content from the peer node and deliver it to the user. In this model, information is replicated across multiple nodes, rather than a single, centralized location in the HTTP model. This eliminates the risk of a single point of failure while increasing retrieval speed as data is retrieved from multiple peers simultaneously.

IPFS is a communication network used to store and transmit data, and Filecoin is an economic system built on top of it. IPFS by itself does not incentivize users to store other people’s data: Filecoin does. This is done through two unique proof mechanisms: Proof of Replication (PoRep) and Proof of Spacetime (PoSt). PoRep runs only once to verify that storage miners have what they say they are. For each on-chain PoRep, 10 SNARKs (Succinct Non-Interactive Arguments of Knowledge) are included, which prove the completion of the contract. PoSt, on the other hand, runs continuously to prove that storage miners are dedicating storage space to the same data over a period of time. The on-chain interactions required to verify this process are data-intensive, so Filecoin uses zk-SNARKs (zero-knowledge succinct non-interactive arguments of knowledge) to generate these proofs and compress their data by a factor of 10.


Of the four DCS protocols discussed, Sia was the first to be released and was released in June 2015. David Vorick and Luke Champine founded Sia at HackMIT in 2013, and the company has strong user traction and a fully diluted market capitalization of $190 million, an all-time high of $2.97 billion.

Sia was launched by nebula Labs, which was founded in 2014. In a similar manner to Filecoin, Sia divides the uploaded data into composite parts (in this case, fragments) and distributes them across distributed hosts around the globe. Unlike Filecoin, Sia achieves this through a different proof-of-storage (PoS) mechanism. This proof requires the hosts to share a small portion of randomly chosen data over time. This proof is verified and stored on the Sia blockchain, and the host is rewarded with Siacoin.


Like Filecoin and Sia, Storj has gained significant traction since its launch in October 2018. Storj differs from Filecoin and Sia in that it does not rely on blockchain consensus to store data. Instead, Storj relies entirely on erasure coding and satellite nodes to store data to increase data redundancy and reduce bandwidth usage. Storj’s exclusive use of erasure coding means that data durability (the probability that data will remain available in the event of a failure) is not linearly related to the scaling factor (the extra cost of storing data reliably). Therefore, on Storj, higher durability does not require a proportional increase in bandwidth. Given node switching (the rate at which nodes go offline (or leave the network)), erasure coding may be valuable in the long run as it requires less disk space and bandwidth for storage and repair, although it increases When the CPU is running.

Storj also differs from Filecoin and Sia in terms of network architecture and pricing mechanism. On Storj, pricing is determined by intermediate storage users (including applications) and satellite nodes of storage nodes. Satellite nodes are responsible for negotiating price and bandwidth utilization. Therefore, Storj’s pricing model is not entirely dependent on free market activity, but is subject to centralized power, as satellite operators represent a potential centralized intermediary between nodes and end users.

Storj is also natively integrated with Amazon S3, which means existing Amazon S3 users can migrate to Storj and use basic features without changing their codebase, potentially reducing friction associated with leaving the Amazon S3 ecosystem .


Unlike Filecoin, Sia and Storj, Arweave provides permanent data storage. Founded in June 2018 by CEOs Sam Williams and William Jones, Arweave has a fully diluted market capitalization of $890 million as of July 12, 2022, reaching an all-time high market capitalization of $4.18 billion.

Arweave seeks to provide permanent data storage in a decentralized manner for a one-time fee, which is done through the Arweave donation mechanism. Considering that the cost of data storage has declined at a rate of 30.5% per year over the past 50 years, Arweave believes that the purchasing power of storage per GB/$1 today is higher than the cost per GB/$1 in the future. This delta enables Arweave’s donation pool, where “principal” is the upfront fee paid by users, and “interest” is the purchasing power that increases the price of the token over time. Arweave’s conservative assumption is that storage prices drop by 0.5% per year, which allows the donation pool to survive in the long term.

Arweave’s current cost of about $3.85/GB reflects the end value of data storage. In the short term, Sia and Filecoin (or even the Big3) are cheaper. But in the long run, Arweave becomes the smarter choice. Even in the short term, users pay a premium for something that others can’t: data persistence. For some, the cost of permanent storage is relatively inelastic, as certain files, such as NFTs, require permanent storage.

Arweave is powered by block weaving, a blockchain-like data structure where each block is linked to the previous block and the recall block. A recall block is any block that has been previously mined in addition to the most recently mined block. So Arweave’s structure is not just a chain linking consecutive blocks together – it’s a weave linking the current block to a previously mined block and another random block (the Recall block).

In order to mine a new block and receive a mining reward, miners must prove that they have access to the recalled block, and Arweave’s Proof of Access (PoA) mechanism guarantees that, for each newly mined block, the data of the recalled block is also included. This means that to store new data, miners must also store existing data. PoA also encourages miners to replicate all data equally between nodes. When a less replicated block is selected as a recall block, miners able to use that block will compete for the same reward in fewer miner pools. Other things being equal, miners who store less-replicated blocks will receive larger rewards over time.

Built on top of the block weave is the Perpetual Web – similar to today’s World Wide Web, but permanent. Arweave’s Bockweave is the base layer that powers Permaweb; the layer that users interact with. Given that Arweave is built on HTTP, traditional browsers can access all data stored on the web, allowing for seamless interoperability.


While DCS solutions may be superior to CCS solutions in theory, they should be evaluated on the basis of their usefulness in practice, and we can gauge the attractiveness of each project by examining the following:

1. Stored data

2. Node distribution

3. Interested searches

4. The power of ecosystems

5. Demand-side income

1. Stored data

Demand is directly measured by examining the amount of data stored over time and is seen as a key KPI for DCS providers. From this metric alone, Filecoin has the advantage; as of this writing, Filecoin stores more than 90% of the DCS datasphere, compared to just 82.8% 90 days ago.

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 9. Proportion of the DCS data market that is stored. Source: Storj Stats, SiaStats, Viewblock,, Hunter Lampson.

Not only does Filecoin store the most data, it also grows the fastest. Data stored on the Filecoin platform has grown by 112% over the past 90 days.

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 10. Datastore growth (last 90 days). Source: Storj Stats, SiaStats, Viewblock,, Hunter Lampson.

According to these are storage protocols, the amount of data stored is an important indicator, although it has serious limitations. The amount of data stored tells us nothing about the protocol benefits or the data itself (how much is it worth, what its function is, how long it will be stored, etc.). There is an ongoing debate between DCS and CCS vendors about how to characterize stored data, as not all data is valued (and treated) equally. Some data are more important than others. Users may divide their storage providers by this metric, so the amount of data stored only paints a partial picture.

The amount of data storage also lacks context on how the data will contribute to the demand-side revenue of the protocol, which is especially problematic when considering Filecoin, the only DCS service that provides storage for essentially free. For this reason, users may use Filecoin to store data due to its current pricing (more on that later…). While I’ve had a hard time finding open sources on this (for obvious reasons), it’s interesting to note that the countless builders and researchers in this space – all of whom I have great respect for – tell me that Filecoin tends to work with larger institutions , offering free storage to manipulate their storage metrics. In theory, Filecoin can store infinitely more data than any other DCS protocol and still generate zero demand-side revenue.

2. Node distribution

While the amount of data storage is a direct measure of storage requirements, we can also look at indirect measures. Node distribution is important to understand because it highlights the geographic components of demand-side and supply-side actors. We can evaluate this by looking at the geographic distribution of 1) storage nodes and 2) search interests.

The more dispersed the storage nodes are in space, the better , and higher dispersion (usually) creates greater decentralization and reduces retrieval time from node to end user. Higher decentralization also reduces the risk of irrecoverable data loss (usually due to environmental factors such as natural disasters). Ideally, nodes would not be scattered arbitrarily in space, but rather related to storage requirements in space (perhaps equivalent to technology saturation times population density). Given that the US, China and Europe have the most concentrated storage needs, we expect them to have the most concentrated storage nodes. Therefore, it makes sense that the distribution of nodes in both CCS and DCS solutions are concentrated in the US, Europe and China . The DCS node distribution is similar to the CCS storage center distribution, which is a positive sign that the DCS solution has reached an important level of market maturity.

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 11. Geographical distribution of DCS nodes. Source: Filscan, Viewblock, Storj, SiaStats, Sam Williams.

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 12. Geographical distribution of CCS nodes (data centers). Source: Peterson 2015, Baxtel, Google.

3. Interested searches

If we think of node distribution as a distribution in DCS supply, then we can (at least in part) think of search interest distribution as a distribution in DCS demand. (This assumption is based on the fact that for every search for a DCS solution, the searcher is more likely to be a user of storage space than a provider.)

According to this metric, Filecoin clearly currently has the highest search interest dominance globally, relative to Storj, Sia, and Arweave. Therefore, one might expect Filecoin to have the highest relative demand.

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 13. Relative search interest dominance by country. Source: Google Trends. Note: I use the term “Sia” instead of “Siacoin”

These assumptions are based on current supply and demand indicators, but looking back, similar conclusions can be drawn. Filecoin has been the most searched DCS solution since mid-2021. Notably, Arweave in August 2021 and Storj in November and December 2021 almost surpassed Filecoin’s search interest.

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 14. Relative search interest over time. Source: Google Trends

While interest search can be a useful metric, it has serious limitations. This metric shows us how individual users use Google to get information about each item, search interest doesn’t tell us the actual protocol needs.

It can be easily concluded that since Filecoin has raised the most money to date, they probably have the most money to spend on marketing. Well, perhaps marketing budgets alone can explain changes in search interest within each item. Perhaps interest search dominance is more predictive of funding than protocol demand – who’s to say? Additionally, Filecoin has well-understood, keyword-heavy domains like Web3. Storage and non-functional testing. When storing, it may also cause data deviation. Users may come across Filecoin’s services when searching for “Web3 Storage”, which is purely based on SEO and the domain they own.

Another limitation of interest search variability is that it may be highly irrelevant to storage requirements. For example, if a user intends to move hundreds of terabytes of data to a DCS provider, their search activity (one search) will not reflect their actual storage needs. It is also possible that external variables, such as the degree to which cex (centralized exchanges) like Coinbase market these individual tokens, play a big role here.

4. Ecosystem

Because DCS solutions exist at the infrastructure layer, their ecosystems often represent user needs, as their users (consumers, companies, developers) can choose which ecosystem to use or build on. The power of the ecosystem comes from 1) projects built on top of the protocol and 2) existing projects that work with it. Considering the maturity of their partnership and the rate at which new projects are added, Filecoin has the strongest ecosystem. In the past 18 months, the Filecoin ecosystem has grown from 40 projects to over 300 projects. Filecoin has an impressive array of partners including: Chainlink, Polygon and Polygon Studios, The Graph, Near, ConsenSys, Brave, ENS, Flow, Hedera, ChainSafe, Ceramic, Livepeer, Audius, decrypt, MoNA and Skiff .

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 15. The Filecoin ecosystem. Source: Messari

To help grow their ecosystem, the Filecoin Foundation has invested heavily in its ecosystem and grant programs. Protocol Labs, the team behind Filecoin, has made 46 direct investments so far, deploying over $480 million to ecosystem projects including decrypt, Syndicate, ConsenSys, and Spruce.

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 16. Protocol Labs investments over time. Source: Crunchbase

Second only to Filecoin is the Arweave ecosystem, Filecoin has nearly 300 partners, while Arweave has about 60. While many partners could benefit from both platforms — for example, Mirror and Skiff could offer users both Filecoin and Arweave — other projects, like Solana, are unlikely to use both. This means that many of the most critical web3 infrastructure projects – protocols, dApps, NFT platforms – will find product-market fit storage protocols that are ideologically aligned with either Filecoin or Arweave, depending on the specific use case . The strength of each ecosystem will play a vital role in the long-term viability of each platform, so the ability to win the hearts and minds of builders old and new is paramount.

It’s worth noting that relative to Filecoin, Arweave’s ecosystem builds more new projects on the platform — because they rely on technology to survive — rather than projects that selectively leverage existing technology. This also explains why Filecoin is partnering with more established projects, not because Filecoin partnerships have grown faster and more successful, but because Filecoin partners such as Cloudflare and Opera have been around longer. By contrast, Arweave’s partners are generally early-stage companies built on the web from scratch. Some of Arweave’s notable partners include Solana, Polkadot, The Graph, Mirror, Bundlr, Glass, KYVE, Decent Land, ArDrive.

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 17. The Arweave ecosystem. Source: @axo_pas (on Twitter)

Since 2020, Arweave has committed nearly $55 million to 15 ecosystem projects, including Mask, Fluence, and Pianity. Through their Open Web Foundry Acceleration Program, Arweave helps developers create permanent web applications and has invested through their community-run ecosystem fund, ARCA DAO.

Sia and Storj have smaller ecosystems with around 30 and 13 projects respectively . Although Sia and Storj have a small ecosystem, they have an excellent partnership. Some of Storj’s partners include CoinMarketCap,, Kraken, Filebase, Render, Akash, and Quant, while Storj’s partners include Microsoft Azure, Fastly, Couchbase, and Pokt. Importantly, Storj’s strategy is built around capturing existing Amazon S3 users, including large incumbents. As a result, many of Storj’s partners may refuse to go public. Storj’s partners may not see any benefit in being listed as such a firm. In contrast, new web3-native projects built on Arweave may indeed benefit from being listed as a partner to demonstrate their immersion in the ecosystem. Different advocacy motivations make ecosystem comparison challenging because we lack a complete dataset.

Today, Sia is primarily used by Filebase (the first Amazon s3-compatible dApp) and Arzen (a consumer-facing decentralized storage app).

5. Demand-side income

Data storage may be the most direct measure of user engagement, but demand-side revenue measures the value of user engagement—or the project’s ability to monetize user engagement. As Sami and Mihai (both MessariCrypto analysts) explain in their article on the Filecoin revenue model, demand-side revenue is a useful metric for infrastructure projects because it measures what people pay to use the network Fees (in this case: fees paid to store data). Importantly, demand-side revenue does not include block rewards paid to miners.

While demand-side revenue data for Arweave, Sia, and Storj can be found on the Web3 Index, demand-side revenue data for Filecoin is hard to find (if anyone can find it, I’d love to see it). Therefore, we cannot include Filecoin in demand-side revenue comparisons.

What we do know about Filecoin’s revenue is that while the data stored on their platform has grown, their revenue has remained flat. This could be due to two reasons: First, the HyperDrive update increased storage access by a factor of 10-25, resulting in a lower demand for data block space (which, as we’ll see later, hurt Filecoin’s token value) . Second, storing more data without generating higher revenue shows that Filecoin is actually storing data for free. Therefore, miners are getting an unsustainable block reward of about 20.56 FIL per block, which will decrease over time, and in the near future Filecoin will need to increase storage prices to incentivize miners to participate from the network.

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 18. Filecoin’s protocol revenue remains uncorrelated with ecosystem growth. Source: Messari

But for Storj, Sia and Arweave, we can see demand-side gains in the last 90 days.

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 19. Demand-side revenue for Storj, Sia, Arweave (last 90 days). Source: Web3 Index

Because Filecoin, Sia, and Storj are contract-based (temporary) solutions where users pay to store data on an ongoing basis, we can assume that a non-zero portion of their demand-side revenue is recurring. In contrast, we can assume that 100% of Arweave’s demand-side revenue is non-recurring, as users pay a single fee to store data permanently. This means that the only way for Arweave to generate demand-side revenue is to store net new data. This is a huge challenge for Arweave, reminding us of the inadequacy of comparing Arweave to DCS.

What may prove to be a moat — or a competitive disadvantage — is the difference in demand-side revenue efficiency (roughly equal to price) between DCS platforms. I define demand-side “revenue efficiency” as demand-side revenue per byte of data stored. Over the past 90 days, Storj and Sia have generated demand-side revenue of $96.50 and $89.90 per TB uploaded, respectively, while Arweave has generated about $10,200 per TB uploaded. This pricing model is another fundamental difference between Arweave and its DCS competitors: Arweave charges a premium for services with unique features. It also means Arweave can store 113 times less data than DCS and still generate the same demand-side revenue as DCS competitors. This suggests that Arweave shouldn’t store the same amount of data as other solutions because its services and pricing mechanisms are not comparable.

Token Valuation


Storj, Sia, Arweave and Filecoin are understood as a combination of 1) a utility token (gift card) and 2) a medium of exchange (currency). The valuation of a utility token is based on its expected future utility; the value of a currency is determined by supply and demand. Holders of utility tokens can exchange them for services — in this case, cloud storage. The ability to redeem utility tokens for specific services makes it similar in structure to traditional gift cards or vouchers. However, unlike gift cards and vouchers, utility tokens are programmatically provided and autonomous, while gift cards and currency are commercial or government provided and (almost) always issued in a different currency (usually fiat). Programmatic supply guarantees a specified supply schedule, which allows us to accurately calculate token supply. (The recent 9.1% CPI data shows us the power of programmatic money supply) We combine this with traditional monetarism theory to derive the intrinsic value of each token.

To be clear: I am interested in valuing token price changes over time using traditional monetary theory and discounted cash flow analysis. I am not going to assess the value of the protocol itself (i.e. the total revenue generated [although revenue is important…]), nor am I going to assess the profits of storage miners or storage providers. I also recognize the extraordinary cultural value these agreements generate, especially when they (inevitably) become public goods. That said, the token price per protocol probably doesn’t take these into account, so neither do I.

First, an important distinction: traditional public securities (like $AAPL: Apple) and tokens (issued by the protocol) represent different things. While protocols generate asset flows, they don’t generate cash flow like Apple does. Therefore, tokens should not be confused with public stocks. Tokens represent the right to use/trade; publicly issued shares represent ownership. (Tokens can not only represent rights to utilities/transactions – for example, they can also include governance rights.) The price valuation of tokens over time needs to incorporate different mechanisms: monetary theory and discounted cash flow analysis .

Token Valuation Model

The primary framework I use to evaluate token prices is the model proposed by Chris Burniske in his seminal work: Cryptoasset Valuations. Chris believes that instead of modeling a traditional DCF, it’s better to keep the same structure and replace the cash flow with an exchange equation so that we can derive the current utility value of each token. We then apply a discount rate to future utility value to derive today’s intrinsic value.

Replacing with the exchange equation: MV = PQ helps us incorporate the monetary nature of the token. As Chris (and countless others) have acknowledged, this model has its limitations (all predictive models have), but it’s probably the best model we have. Given the lack of a perfectly efficient market, and the large margin of error inherent in predicting the future, the model is best used to account for the various levers that generate token value.

“Cryptoasset valuation consists primarily of solving for M, where M = PQ / V. M is the size of the monetary base needed to support a cryptoeconomy of size PQ and velocity V,” Chris wrote.

Block unicorn notes: M = asset base size, V = asset velocity, P = price of token resources being offered, Q = amount of token resources being supplied.

Token Valuation Model: Input

To estimate M, V, P and Q, I will use the following method:

Mathematically derived inputs

1. Maximum Supply

2. Circulation

3. CAGR of circulation

4. Storage cost (1$/GB/year) or (1$/GB)

5. Annual decrease in storage cost (CAGR) 

6. Size of the data storage market

7. Annual Growth of Data (CAGR)

This is derived from the average annual decline in storage costs ($/GB/year) for the Big Three over the past decade, as shown in the following graph (Figure 20):

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 20. Data storage cost and CAGR for AWS, Azure, Google. Source: AWS, Azure, Google, Hunter Lampson.

subjective assumption

1. Percentage of tokens held (percentage of uncirculated public supply)

I assume that in 2021, 50% of the tokens are held. This assumption stems from the fact that, historically, roughly half of Coinbase users view Bitcoin strictly as an investment, while the other half see it as a medium of exchange.

2. Percentage change in tokens held each year

I am assuming that token holdings will decline at a rate of 1% per year starting in 2022. As the market becomes more balanced, there is less potential for value appreciation, so the number of tokens in circulation will increase (token holdings decrease). This is hard to estimate — again, it’s best understood as a leverage that helps coin valuation.

3. Speed

Assuming a 20% growth rate for each coin, I’m using 20% ​​here to be conservative given that Bitcoin’s rate has historically been around 14%.

4. TAM (get global data market share)

I assume Arweave can handle 10% of the global data market and Filecoin, Sia and Storj can handle the remaining 90%. Persistent data storage is a whole new market, so it’s hard to determine what percentage of the existing data market it can handle, so I’m using 10% here, hoping to be conservative. Temporary data storage—the dominant storage solution today—must account for 100% of the data market. If we assume that 10% of the existing data market will transition to Arweave, the remaining 90% is left to Filecoin, Sia and Storj to handle.

5. Get the maximum percentage of TAM

I’m assuming the maximum percentage that TAM gets is 1% for Arweave, Sia and Storj. Therefore, I assume that Arweave captures 1% of the 10% of the global data market (Arweave captures 0.1% of the global data market in total, and Sia and Storj each capture 0.9%). Given its traction and maturity, I assume Filecoin captures 25% of available TAM (25% of 90% = 18% of the global data market).

6. Inflection point

I’m assuming 2024 is the year each network hits an inflection point that hits 10% of the maximum TAM gain percentage, which is nearly impossible to predict — another illustrative lever.

7. Saturation/year

I assume saturation/year (the time it takes for the network to go from 10% to 90% of the maximum percentage of TAM) is 10 years for Arweave, Sia and Storj, and 4 years for Filecoin, another impossible prediction.

8. Discount rate

I’m assuming a discount rate of 40%, which is the industry standard for assets with this level of risk.

Here is a brief view of inputs and subjective assumptions for all fixed and variable mathematical derivations:

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 21. Concise view of fixed and variable mathematically derived inputs and subjective assumptions used in token valuation models. Source: CoinMarketCap, Uygun and Döngül, 2021, Chris Burniske, Hunter Lampson.

About the different inputs of Filecoin:

1. Data Storage Cost Reduction (CAGR) = 0%

2. Assuming storage cost ($/GB/year) = $0.002/GB/year

In the table, I explicitly label Filecoin data storage cost decline (CAGR) and hypothetical storage cost ($/GB/year) as a very subjective assumption, although explicit data on this is available. I’m doing this because Filecoin’s current pricing is too low to be sustainable.

First, let’s start with hypothetical storage costs ($/GB/year). Currently, the cost of storage on Filecoin is about $0.0000017/GB/year, or 0.0011% of the cost of storage on the three major providers. As I discussed above, Filecoin’s pricing model is unsustainable because it is heavily subsidized by block rewards. Since their $200 million+ ICO, Filecoin has subsidized the cost of storage on their network. As they drop their subsidy, we can expect their storage costs to increase from current levels. An increase in storage costs, all things being equal, makes $FIL more valuable (like any token) with a fixed demand, but we can assume that as Filecoin inevitably raises its price, the demand for storage on its network increases May decrease, lowering the intrinsic value of $FIL.

It’s hard to say how the team will execute on raising the price, even if the price remains lower than the Big Three. If we run the model at the current pricing of about $0.0000017/GB/yr, the intrinsic value in 2022 is about $0.00/FIL, again showing that FIL’s pricing model today is unsustainable . Therefore, I estimate Filecoin storage costs to be $0.002/GB/year over the next 10 years (100x cheaper than the big three) (assuming a 0% decrease in data storage costs [CAGR]). This keeps Filecoin price-competitive – making them 100x cheaper than the top three solutions – while providing significant value to the token price. Think of this investment as a personal expectation, or even a requirement, for the sustainable development of Filecoin.

Token Valuation Model: Input

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 22. Token valuation model inputs. Source: CoinMarketCap, Hunter Lampson

Under the same conditions, the model leverages as follows:

Data Determines Humans: Exploring Decentralized Cloud Storage

Figure 23. Illustration of various levers that generate token value. Source: Hunter Lampson

This table refers to the general input given each lever change, not a guarantee that the input will always be correct when the leverage is increased/decreased to arbitrary high/low numbers. For example, consider velocity: on average, as velocity increases, the price of a coin decreases. But an arbitrarily low velocity level, say 0 velocity, would mean that the token is traded 0 times per year, thus requiring a monetary base of 0 to satisfy the ecosystem. That said, by avoiding endings, the general trends cited in the table are useful.


The economics of Arweave are the most defensive, driven by relatively low $AR token supply and relatively high storage costs ($/GB). This builds in part on my previous conclusion that Arweave is the most demand-side revenue-efficient of the DCS products, meaning it can store 113 times less data than comparable products and generate the same demand-side revenue. Also, I assume that Arweave captures 0.10% of the global datasphere, which is conservative enough to be reasonable. If this is achieved, the 2032 token price is expected to rise +182.91x from current levels. While Arweave’s higher relative pricing may strengthen its unit economics, it can also be the Achilles’ heel for user adoption, who will ultimately decide whether Arweave’s service is worth the money.

Even if we assume that users are willing to pay these extra fees, in theory, they have to be persuaded to use the product in practice. Because Arweave’s product is fundamentally different from its competitors, switching costs may be too high and the service too unique to win new users. Despite Arweave’s potential advantages, high costs and reliance on entirely new markets could be insurmountable hurdles. As mentioned, the only way for Arweave to generate demand-side revenue is to store new data. On the surface, Arweave doesn’t appear to have demand-side gains per bit of data — something that all CCS and DCS competitors can benefit from. Instead, I think Arweave benefits from the demand-side benefits of involuntary duplication. Instead of charging users perpetually, Arweave earns “permanent recurring demand-side revenue” in advance, which may be one of Arweave’s most valuable donation mechanisms.

Currently, Filecoin’s economy is the least reliable due to its low price. Given a fixed supply of tokens, the lower the cost of a utility, the smaller the monetary base that backs it must be. This view defines low pricing as a negative rather than a positive attribute of a token’s value. It’s also possible that Filecoin’s low pricing set the stage for its widespread adoption. Low pricing could also be a key differentiator for Filecoin, which could be a necessary moat.

My concern, however, is the important role that pricing power will play in determining Filecoin’s future. As Tushar and Spencer say, Filecoin (along with Sia and Storj) is in direct competition with the big three in the ephemeral storage market. A price war with the Big Three could be disastrous. If Filecoin can keep prices low without unsustainable subsidies, its maturity, ecosystem strength, and industry-wide clout make it the most potent challenger to the Big Three. If it ends up escalating into a price war – which may be inevitable – things could go badly.

According to the model, Sia’s token economics make it worth 4.5 times more than Storj based on the difference between current pricing and 2032 price forecasts. Typically, Sia and Storj are classified as Filecoin’s younger brothers. Given Sia and/or Storj’s less robust ecosystems, it’s hard to imagine them replacing Filecoin’s dominance in this space in the near future. Nonetheless, Sia and Storj’s token economy is more attractive than Filecoin’s. Pricing power is integral to both token valuation and the long-term viability of each project.

Limitations and Reflections on Future Research

1. Cloud storage ≠ cloud computing. As Christine Deakers pointed out, many cloud storage users simultaneously use cloud computing for the data they store. DCS solutions must address this. Filecoin has already started building its virtual machine – other DCS solutions will likely follow suit.

2. DCS solutions require more integration. As Mark Gritter pointed out, most IoT applications require not only distributed storage, but also a decentralized database. This can be a major barrier to adoption if the DCS solution does not have native integration with traditional time series databases.

3. The DCS solution should allow for location selectivity. One example that Mark Gritter mentioned was self-driving cars. The stream of sensor data collected by self-driving cars must be stored in a decentralized manner to achieve the lowest possible latency. If data uploaders (cars and car companies) cannot choose a nearby location to store the data, a DCS solution may not be a good solution for this use case.

at last

(1) Although cloud computing is different from cloud storage, we can reasonably make a set of assumptions: First, companies that provide cloud computing services (such as the Big Three) tend to provide such services on the data they store. In other words: customers typically use both compute and storage services on one platform.

Second: We can assume that as cloud companies capture more market share, they benefit from better unit economics at an increasing rate. The larger a company is, the more effectively it negotiates hardware pricing, which lowers costs for customers, attracts more users, and further enhances its negotiating power. So, when I mentioned that the Big Three have 65% of the cloud computing market share, we can assume that they have a similar amount of cloud storage market share.

(2) In this post, I use the terms “secure” and “secure” to describe data that is highly replicated on a distributed set of nodes, which results in higher data redundancy, more consistent uptime, and Reduces the likelihood of censorship and single point of failure risk.

Posted by:CoinYuppie,Reprinted with attribution to:
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.

Like (0)
Donate Buy me a coffee Buy me a coffee
Previous 2022-08-02 23:17
Next 2022-08-02 23:19

Related articles