Provide a decentralized data sharing space for science

The corpora of scientific data are decentralized and access-controlled, and their rapid growth has outstripped the ability of centralized services to maintain them. Recent developments in peer-to-peer technology have made it possible to create a permanent archive of scientific records open to all. In the early stage, the DAOrayaki community has compiled a series of DeSci articles. In this series of articles, we will deeply explore the cutting-edge technologies of decentralized file storage networks and provide potential development paths for a collaborative decentralized science ecosystem.

Who should have the knowledge?

Historically, the boundaries of human knowledge acquisition have been limited by observational tools and high-quality data. What we have today, the ability/power to make leaps in understanding of all areas of the natural world, was previously reserved for a privileged few.

Ptolemy used armillary spheres and papyrus to document the boundaries of the Earth as understood by man, which have remained unchallenged for the past thousand years. Galileo used convex lenses and parchment, with a sphere like a god locked in the universe, to explain the boundaries of the universe. Hubble used the power of the Hooke Telescope to delineate an infinitely expanding horizon for all human knowledge, in a universe where everything is possible, leaving new challenges for later seekers of truth.

Provide a decentralized data sharing space for science

Distributed Knowledge, Anatomy Plate. 1857 JG Heck

Even until recently, only those who belonged to the exclusive scholar club had access to the treasure trove of instruments and data needed to tackle major challenges in science. Today, open collaboration and data-sharing practices can enable future scenarios in which astronomy and physics will make even greater progress. The problems are too large, with too many complex models, and the engineering challenges too difficult for even the most enlightened person to solve alone. Our accumulated understanding of the scope of the universe will expand, and high-quality datasets, and the tools used to accompany them, will become more accessible to each and every one of us.

Rich in data, poor in wisdom

While the astronomy community has set standards for collaborative open science practice, many fields are still rooted in traditional practices “based on reputation and self-professional development.” For many, it is difficult to know how we can transcend this antagonistic entrenched academic interest. However, the real-world challenges facing modern science today will force it to inevitably lead to a cultural revolution, a paradigm shift that has already occurred with the advent of open scientific data sharing, journals, and free software today. The amount of data we now have in scientific observations of the natural world is exploding beyond what traditional institutional infrastructure can provide to maintain, store and sift through the largest load of tools for maintaining, storing and sifting ever-expanding vast amounts of raw knowledge. ability.

Thousands of petabytes of valuable data and observations about human health, economic activity, social dynamics and the universe and our impact on it are stored in outdated storage systems. This data is inaccessible to search engines, stored in mysterious schemas known to only a few, and likely never used. It is estimated that more than 80% of the original scientific data collected in the 1990s is lost forever due to outdated technology and imperfect archival infrastructure. Even today, starting three years after a paper is published, the probability of finding a dataset is decreasing year by year by 17%. The practice of deliberately restricting access to scientific data limits the pace of innovation in our society.

Decentralized file storage protocols provide a solution to this failure through content-addressable data, programmable incentives for data storage, provenance tracking, censorship resistance, and bandwidth that scales with global adoption. The peer-to-peer scientific data commons, driven by these capabilities, can provide a resilient digital fabric that allows decentralized communities to maintain cognitive unity around today’s most critical and challenging problems.

A Brief History of Peer-to-Peer Content Networking

Peer-to-peer file sharing is as old as the internet. In fact, ARPANET, the precursor to the Internet as we know it, was strictly a peer-to-peer network when it was first launched in 1969. The resilience of network degradation, high bidirectional bandwidth, information redundancy, resource aggregation, and inherent participatory nature are the main reasons that make distributed peer-to-peer networks the design of choice for early Internet architects and engineers. Multiple iterations of this direct information sharing have occurred in the short history of the Internet.

The advent of public key cryptography in 1973 marked the beginning of identity protocols and verifiable content through a clever system of key pair signatures. For the first time, users on the web can trust a packet of information encrypted by a key, provided it is the only decryption of the key publicly released by a known identity. Later, Ralph Merkle invented Merkle trees in 1979 as a way to track the origin of information packages, paving the way for version control software such as git and svn. The integration of public key cryptography with Merkle tree data structures has driven new innovations such as blockchain, distributed computing and consensus mechanisms that enhance resilience to attacks and minimize distribution information fragmentation problem in the network.

One of the best-known examples of a distributed network, Napster, connects peers through a centralized index server that was later shut down by authorities after Metallica sued for copyright infringement in 2001. The introduction of Distributed Hash Tables (DHT) has revolutionized the design of peer-to-peer networks, unlocking higher levels of decentralization and making the network more resilient to content moderation and censorship. DHT was originally used to help the mutual memory of the locations of nodes on a peer-to-peer network. In the early Internet era, this approach would allow peer-to-peer networks to scale in a truly decentralized manner, since they did not need to rely on centralized servers like Napster. The extremely popular peer-to-peer network BitTorrent was one of the first to use DHT technology.

Provide a decentralized data sharing space for science

Bitcoin Codebase Fingerprint

In 2009, Bitcoin entered people’s field of vision. While peer-to-peer networks prior to Bitcoin allowed users to transfer data to each other easily and quickly, they were not designed to be tamper-proof records of cryptographically verifiable exchanges. Events can only be attached to the Bitcoin ledger when the nodes submitting the transaction prove that they have completed a certain amount of computational work in a short period of time. Bitcoin is the first instance of a peer-to-peer network with a single global state that defines the truth of network consensus, the transfer of cryptographic tokens representing economic value.

The cryptographic proof concept for verifying events in a distributed network paves the way for accelerating innovation in peer-to-peer technology. The Interplanetary File System (IPFS) is a peer-to-peer file-sharing protocol that combines key advances in decentralized computing (such as DHT and Merkle trees) with cryptographic proofs to provide the foundational layer for permanent record archiving on the Internet. IPFS makes it possible for information to truly belong to the network’s public resources. IPFS makes it inherently resistant to geographic censorship through revisions to content, corrections for data integrity attacks, and corrections to bandwidth bottlenecks imposed by centralized service providers. ability.

The State of Cloud Storage

In the early 21st century, the emergence of centralized cloud service providers has become the gatekeeper of content on the Internet. Today, the cloud storage market is dominated by very few players. According to Canalysis (2020) estimates, Amazon, Microsoft and Google control more than half of the market, while Amazon alone controls a third of the market. Amazon achieved its near-monopoly status by solving the key scalability problems of the early internet, but just as it did, Amazon also created a new set of problems, all of which stemmed from centralization. The main problems are inefficient resource allocation, data fragmentation in siloed repositories, lack of privacy and security, and unnecessarily high costs. By and large, cloud service providers control all the stored data they manage, making them the arbiters of access to knowledge.

Provide a decentralized data sharing space for science

Classification of control models employed by large tech companies

Amazon recently started offering scientists enticing data storage deals to further increase the size and depth of its content moat. Analysts speculate that Amazon’s services could become even more valuable if they were able to compile large, high-quality interoperable datasets from researchers in industry, academia and government. For example, the Allen Brain Observatory has struck a deal with Amazon to store 10 terabytes of valuable neuroimaging observations in its cloud.

While Amazon provides free storage for data uploads, exporting from their servers tends to incur hefty fees, sometimes capturing data within their vast computing centers, and making Amazon the de facto owner of publicly funded research. Community response appears to have led Amazon to consider a 15% monthly cloud storage fee waiver for “qualified” research institutions. It appears that Amazon has learned from the scientific publishing industry by making the acquisition of knowledge another lucrative component of its increasingly expanding cloud computing business model. Even so, a countercurrent is building against the trend toward centralization and is poised to shatter the bedrock of control that Big Tech has built over the past 20 years.

Looking forward to a more open network

As part of this countercurrent, IPFS has led to the emergence of many other technological innovations that power the decentralized web. In this series of articles, we cover the major decentralized data storage protocols and discuss their potential as underlying structures for decentralized scientific data commons. We take a deep dive into the history, mechanics, and popular applications behind IPFS.

quote

  1.  
    Allen Brain Institute. (2018, August 9). Neuroscience Data Joins the Cloud. Retrieved November 21, 2021, from https://alleninstitute.org/what-we-do/brain-science/news-press/articles/neuroscience-data-joins-cloud

     

  2.  
    Amazon. (2018, July 12th). New AWS Public Datasets Available from Allen Institute for Brain Science, NOAA, Hubble Space Telescope, and Others. Retrieved November 12, 2021, from New AWS Public Datasets Available from Allen Institute for Brain Science, NOAA, Hubble Space Telescope, and Others

     

  3.  
    Canalysis. (2020, April 29). Global cloud services market Q1 2021. Retrieved November 27, 2021, from https://www.canalys.com/newsroom/global-cloud-market-Q121

    Cocks, C. (2001, December). An identity based encryption scheme based on quadratic residues. In IMA international conference on cryptography and coding (pp. 360–363). Springer, Berlin, Heidelberg.

     

     

  4. Jocelyn Goldfein and Ivy Nguyen. (2018, March 27). Data is not the new oil. Retrieved 20 November, 2021 from Data is not the new oil — TechCrunch
  5.  
    Merkle, R. C. (1987, August). A digital signature based on a conventional encryption function. In Conference on the theory and application of cryptographic techniques (pp. 369–378). Springer, Berlin, Heidelberg.

     

     

  6. Paratii. (2017, October 25). A Brief History of P2P Content Distribution, in 10 Major Steps. Retrieved November 20, 2021, from A Brief History of P2P Content Distribution, in 10 Major Steps | by Paratii | Paratii | Medium
  7.  
    Nakamoto, S. (2008). Bitcoin: A peer-to-peer electronic cash system. Decentralized Business Review, 21260.

    Vines, T. H., et. al. (2014). The availability of research data declines rapidly with article age. Current biology, 24(1), 94–97.

     

     

  8.  
    Wiener-Bronner, D. (2013, December 23). Most Scientific Research Data From the 1990s Is Lost Forever. Retrieved November, 13, 2021, from Most Scientific Research Data From the 1990s Is Lost Forever — The Atlantic

Posted by:CoinYuppie,Reprinted with attribution to:https://coinyuppie.com/provide-a-decentralized-data-sharing-space-for-science/
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.

Like (0)
Donate Buy me a coffee Buy me a coffee
Previous 2022-05-03 09:41
Next 2022-05-03 09:42

Related articles