What is IPFS
Powering the Decentralized Internet (web3.0).
A peer-to-peer hypermedia protocol that preserves and develops human knowledge by making the network scalable, resilient, and more open.
IPFS is a distributed system for storing and accessing files, websites, applications and data.
The “IPFS” target is something called “HTTP”, which you may be familiar with. When you open a Baidu search page online, what you see is what you get.
The application layer protocol of the web is the Hypertext Transfer Protocol (HTTP), which is the core of the traditional web. HTTP is implemented by two programs: a client program and a server program. The client and server programs run on different end systems and communicate by exchanging HTTP. HTTP defines the structure of this data and the way clients and servers interact.
Web pages are made up of objects, an object is just a file, such as an HTML file, a JPEG graphic, or a small video clip, and they can be addressed by URL addresses. Most web pages contain a base html file, and several reference objects.
HTTP defines how web clients request web pages from web servers, and how servers deliver web pages to clients.
The job of the browser is to execute and parse the HTTP protocol and front-end code and then display the content. When submitting a query, the web side usually queries its database and returns the result to the requester, that is, the browser, and then the browser displays come out.
Disadvantages of HTTP protocol
We now use the Internet to run under the http or https protocol. The http protocol is also the hypertext transfer protocol, which is a transfer protocol used to transfer hypertext from the World Wide Web server to the local browser. It has been 32 years since it was proposed in 1990. , he has contributed greatly to the explosive growth of the Internet, and has made the Internet prosperous.
However, the HTTP protocol is an Internet communication protocol based on the C/S architecture, and it also has many drawbacks based on the centralized operation mechanism of the backbone network.
- Data on the Internet is often permanently erased because files are deleted or servers are shut down. Some people have calculated that the average storage life of web pages on the Internet is only about 100 days, and we often see “404 errors” appear on some websites.
- The backbone network has low operation efficiency and high usage cost. Using the HTTP protocol requires downloading a complete file from a centralized server each time, which is slow and inefficient.
- The backbone network concurrency mechanism restricts the speed of Internet access. This centralized backbone network model also leads to congestion when accessing the network under high concurrency.
- Under the existing http protocol, all data is stored on these centralized servers. Internet giants not only have absolute control and interpretation rights over our data, but also various supervision, blocking and monitoring to a certain extent. It also greatly limits innovation and development.
- The cost is high and it is easy to be attacked. In order to support the HTTP protocol, large-traffic companies, such as Baidu, Tencent, and Alibaba, invest a lot of resources to maintain servers and security risks to prevent DDoS and other attacks. The backbone network is subject to wars, natural disasters, central server downtime and other factors, which may cause the entire Internet service to be interrupted.
Solutions for IPFS
- IPFS provides the historical version retrospective function of the file, which can easily view the historical version of the file, and the data cannot be deleted and can be permanently saved.
- IPFS is a storage mode based on content addressing. The same files will not be stored repeatedly. It will squeeze out excess resources, including the release of storage space, and the cost of data storage will be reduced. If you switch to P2P download, the cost of bandwidth usage can be saved by nearly 60%.
- IPFS is based on the P2P network, there can be multiple sources to save data, and data can be downloaded from multiple nodes concurrently.
- IPFS built on a decentralized distributed network is difficult to be centrally managed and restricted, and the Internet will be more open.
- IPFS distributed storage can greatly reduce the dependence on the central backbone network.
To put it succinctly:
HTTP relies on a centralized server, which is vulnerable to attacks, and the server is prone to downtime due to surge in traffic, slow download speed, and high storage costs;
IPFS is a distributed node, which is more secure and less susceptible to DDoS attacks, does not depend on the backbone network, reduces storage costs, has large storage space, fast download speed, and can find historical version records of files, and theoretically can be permanently stored.
New technology replaces old technology, nothing more than two points:
First, it can improve system efficiency;
Second, the system cost can be reduced.
IPFS does both.
When developing, the IPFS team adopts a highly modularized way to develop the entire project like building blocks. The Protocol Labs team was founded in 2015 and has been developing three modules, IPLD, LibP2P, and Multiformats for 17 years, which serve the bottom layer of IPFS.
Mutiformats is a collection of a series of hash encryption algorithms and self-describing methods (you can know how the value is generated from the value). It has 6 mainstream encryption methods such as SHA1 SHA256 SHA512 Blake3B to encrypt and describe nodeID and Generation of fingerprint data.
LibP2P is the core of the IPFS core. Facing a variety of transport layer protocols and complex network devices, it can help developers quickly build a usable P2P network layer, which is fast and cost-effective. This is why IPFS technology is used by many areas. The reason why blockchain projects are favored .
IPLD is actually a conversion middleware that unifies existing heterogeneous data structures into one format to facilitate data exchange and interoperability between different systems. The data structures supported by IPLD, such as the block data of Bitcoin and Ethereum, also support IPFS and IPLD. This is also the second reason why IPFS is popular with blockchain systems. Its IPLD middleware can unify different block structures into one standard for delivery, providing developers with a relatively high standard of success without worrying about performance, Stable and buggy.
- A hypermedia distribution protocol that combines the concepts of Kademlia, BitTorrent, Git, etc.
- A completely decentralized point-to-point transmission network that avoids the failure of the central node and has no censorship and control
- Tomorrow into the Internet – new browsers already support IPFS protocols (brave, opera) by default, traditional browsers can access
https://ipfs.iofiles stored in the IPFS network by visiting public IPFS gateways with addresses such as
- Next- generation content distribution network CDN – only need to add files to the local node to make files available globally through cache-friendly content hash addresses and BitTorrent-like network bandwidth distribution
- Backed by a strong open source community, a developer toolset for building complete distributed applications and services
IPFS defines how to store, index and transmit files in the system, that is, convert the uploaded files into a special data format for storage, and IPFS will hash the same file to determine its unique address. So no matter in any device, any place, the same file will point to the same address (different from URL, this address is native, guaranteed by encryption algorithm, you can’t change it, and you don’t need to change it). Then connect all the devices in the network through a file system, and then let the files stored on the IPFS system be quickly obtained anywhere in the world without being affected by firewalls (no network proxy required). So fundamentally, IPFS can change the distribution mechanism of WEB content and make it decentralized.
How IPFS works
IPFS is a peer-to-peer (p2p) storage network. Content can be accessed through nodes located anywhere in the world, which may pass information, store information, or both. IPFS knows how to use its content address, not its location, to find the content you ask for.
Understand the three basic principles of IPFS:
- Content-addressable unique identification
- Content Linking via Directed Acyclic Graphs (DAGs)
- Content discovery via Distributed Hash Table (DHT)
These three principles depend on each other to create the IPFS ecosystem. Let’s start with content addressing and unique identification of content
Content addressing and unique identification of content
IPFS uses content addressing to identify content based on content rather than location. Finding items by content is something everyone does all the time.
For example, you are looking for a book in a library, often by title; that’s content addressing because you’re asking what it is.
If you use location addressing to find that book, you’ll find it by its location: “I want the book on the second floor, the third shelf, the fourth floor, four books from the left.”
If someone moved that book, you’re out of luck!
This problem exists both on the internet and on your computer! Content is now looked up by location, for example:
In contrast, every piece of content using the IPFS protocol has a *content identifier*, or CID. The hash is unique to the content it came from, even though it may seem short compared to the original content.
Many distributed systems use content addressing via hashing to not only identify content but link it together — everything from the commits that support code to blockchains running cryptocurrencies take advantage of this strategy. However, the underlying data structures in these systems are not necessarily interoperable.
CID (Content Identifiers )
The CID specification originated in IPFS and now exists in multiple formats and supports a wide range of projects including IPFS, IPLD, libp2p and Filecoin. Although we’ll share some IPFS examples throughout the course, this tutorial is about the anatomy of the CID itself, which every distributed information system uses as the core identifier for referencing content.
A content identifier or CID is a self-describing content-addressable identifier. It does not indicate _where_ the content is stored, but forms a kind of address based on the content itself. The number of characters in a CID depends on the cryptographic hash of the underlying content , not the size of the content itself. Since most things in IPFS use hashes
sha2-256, most CIDs you encounter will be the same size (256 bits, which is equivalent to 32 bytes). This makes them easier to manage, especially when dealing with multiple pieces of content.
For example, if we stored an image of an aardvark on the IPFS network, its CID would look like this:
The IPFS link of uniswap demonstrated before:
The first step in creating a CID is to transform the input data, using an encryption algorithm to map an input (data or file) of arbitrary size to a fixed-size output. This conversion is called a hash digital fingerprint or hash for short (sha2-256 is used by default).
The encryption algorithm used must generate hash values with the following characteristics:
- Deterministic : The same input should always produce the same hash.
- Irrelevant : A small change in the input data should produce a completely different hash.
- One-way : Pushing back the input data from the hash is not feasible.
- Uniqueness : Only one file can generate a specific hash.
Note that if we change a single pixel in the aardvark image, the encryption algorithm will generate a completely different hash for the image.
When we fetch data using the content address, we are guaranteed to see the expected version of that data. This is quite different from location addressing on the traditional Web, where the content at a given address (URL) changes over time.
Structure of CID
Multiformats is mainly responsible for the encryption of identity and the self-description of data in the IPFS system.
Multiformats is a collection of protocols for future security systems, self-describing formats that allow systems to collaborate and upgrade each other.
The Multiformats protocol contains the following protocols:
multihash – Self-describing hash
multiaddr – self-describing network address
multibase – Self-describing base encoding
multicodec – self-describing serialization
multistream – Self-describing streaming network protocol
multigram (WIP) – Self-describing packet network protocol
Content Links Directed Acyclic Graph (DAG)
Merkle DAGs inherit the assignability of CIDs. Using content addressing for DAGs has some interesting implications for their distribution. First, of course, anyone who owns a DAG can act as a provider of that DAG. The second is when we retrieve data encoded as a DAG, such as a directory of files, we can take advantage of the fact that we can retrieve all children of a node in parallel, possibly from many different providers!
Third, file servers are not limited to centralized data centers, allowing our data coverage to be wider. Finally, because each node in a DAG has its own CID, the DAG it represents can be shared and retrieved independently of any DAGs in which it is embedded.
Ever backed up a file, then found those two files or directories months later and wondered if their contents were the same? You can compute a Merkle DAG for each backup without laboriously comparing files: if the CIDs of the root directories match, you’ll know which ones are safe to delete, and free up some space on your hard drive!
For example, a distribution of large data. On the traditional web network:
- The developer sharing the file is responsible for maintaining the server and its associated costs
- The same server is likely to be used to respond to requests from all over the world
- The data itself can be distributed monolithically as a single file archive
- It is difficult to find alternative suppliers of the same data
- Data may be in large chunks and must be downloaded serially from a single provider
- It is difficult for others to share data
Merkle DAG helps us alleviate all these problems. By converting the data to a content-addressable DAG:
- Anyone who wants can help send and receive files
- Nodes from all over the world can participate in serving data
- Each part of the DAG has its own CID and can be distributed independently
- It is easy to find alternative suppliers of the same data
- The nodes that make up the DAG are small and can be downloaded in parallel from many different providers
All of these contribute to the scalability of important data.
For example, take browsing the web! When a person uses a browser to visit a web page, the browser must first download the resources associated with the page, including images, text, and styles. In fact many pages actually look very similar, just using the same theme and others with only minor changes. There will be a lot of redundancy here.
Distributed Hash Table (DHT)
A distributed hash table (DHT) is a distributed system for mapping keys to values. In IPFS, the DHT is used as a fundamental component of the content routing system and acts as an intersection between the directory and the navigation system. It maps what the user is looking for to peer nodes that store matching content. Think of it as a giant table storing who owns what data.
libp2p is a modular networking stack that evolved from IPFS into a standalone project. Polkadot is also used, and eth2.0 is also partially used.
To explain why libp2p is such an important part of the decentralized web, we need to step back and understand where it came from. The initial implementation of libp2p started with IPFS, a peer-to-peer file sharing system. Let’s start by exploring the network problems that IPFS is designed to solve.
Networks are very complex systems with their own rules and limitations, so when designing these systems we need to consider many situations and use cases:
- Firewall : Your laptop may have a firewall installed that blocks or restricts certain connections.
- NAT : Your home WiFi router, with NAT (Network Address Translation) that translates your laptop’s local IP address into a single IP address that networks outside your home can connect to.
- High Latency Networks : These networks have very slow connections that make users wait a long time to see their content.
- Reliability : There are many networks scattered around the world, and many users often encounter slow networks that do not have robust systems to provide users with a good connection. The connection is frequently disconnected, and the user’s network system is of poor quality, unable to provide the user with due services.
- Roaming : Mobile addressing is another case where we need to guarantee that the user’s device remains uniquely discoverable as it navigates through different networks around the world. Currently, they work in distributed systems that require a lot of coordination points and connections, but the best solution is decentralization.
- Censorship : In the current state of the web, it is relatively easy to block a website on a specific website domain if you are a government entity. This is useful for deterring illegal activity, but becomes a problem when an authoritarian regime wants to deprive its population of access to resources.
- Runtimes with different properties : There are many types of runtimes around, such as IoT (Internet of Things) devices (Raspberry Pi, Arduino, etc.), which are gaining massive adoption. Because they are built with limited resources, their runtimes often use different protocols that make many assumptions about their runtimes.
- Innovation is very slow : Even the most successful companies with vast resources can take decades to develop and deploy new protocols.
- Data Privacy : Consumers have recently become increasingly concerned about the growing number of companies that do not respect user privacy.
P2P Protocol Current Issues
Peer-to-peer (P2P) networking was envisioned from the concept of the Internet as a way to create a resilient network that would function even if peers were disconnected from the network due to a major natural or man-made disaster, allowing people to continue to communicate .
P2P networks can be used for a variety of use cases, from video calling (e.g. Skype) to file sharing (e.g. IPFS, Gnutella, KaZaA, eMule, and BitTorrent).
Peer – Participant of a decentralized network. A peer node is an equally privileged and equally capable participant in an application. In IPFS, when you load the IPFS desktop application on your laptop, your device becomes a peer node in the decentralized network IPFS.
Peer-to-Peer (P2P) – A decentralized network where workload is shared among peer nodes. Therefore, in IPFS, each peer node may host all or part of the files to be shared with other peer nodes. When a node requests files, any node that owns those file blocks can participate in sending the requested file. The node party requesting the data can then share the data with other node parties at a later time.
IPFS looks for inspiration in current and past web applications and research to try to improve its P2P system. Academia has a plethora of scientific papers offering ideas on how to address some of these problems, but while the research has yielded preliminary results, it lacks a code implementation that can be used and tweaked.
Code implementations of existing P2P systems are really hard to find, and where they do exist, they are often difficult to reuse or repurpose for the following reasons:
- Bad or non-existent file
- Restrictive license or license not found
- Very old code last updated over ten years ago
- No point of contact (no maintainer to contact)
- closed source (private) code
- Deprecated product
- Specifications not provided
- No friendly API exposed
- Implementation is too tightly coupled to a specific use case
- Unable to use future protocol upgrades
There must be a better way. Seeing that the main issue was interoperability, the IPFS team envisioned a better way to integrate all current solutions and provide a platform that fosters innovation. A new modular system that enables future solutions to be seamlessly integrated into the network stack.
libp2p is the network stack of IPFS, but it is extracted from IPFS and becomes an independent first-class project and a dependent project of IPFS.
In this way, libp2p is able to grow further without relying on IPFS, gaining its own ecosystem and community. IPFS is just one of many users of libp2p.
This way, each project can focus only on its own goals:
IPFS is more focused on content addressing, i.e. finding, fetching and validating any content on the network.
libp2p focuses more on process addressing, i.e. finding, connecting, and authenticating any data-transferring process in the network. So how does libp2p do it?
The answer is: modularity .
libp2p has identified specific parts that can make up the network stack:
Applications include file storage, video streaming, crypto wallets, development tools, and blockchain. Projects that can top the blockchain already have libp2p modules that use IPFS.
IPLD is used to understand and process data.
IPLD is a conversion middleware that unifies existing heterogeneous data structures into one format to facilitate data exchange and interoperability between different systems, data model and decoding, and use CID as a link.
First, we define a “data model”, which describes the domain and scope of the data. This is important because it’s the foundation of everything we’ll be building. (Broadly speaking, we can say that the data model is “like JSON”, like map, string, list, etc.) After this, we define “codecs”, which say how to parse it from the message and as the message we want form issued. IPLD has many codecs. You can choose to use different codecs based on the other applications you wish to interact with, or just based on how your own application likes performance versus human readability.
IPLD implements the top three-layer protocol: object, file, and naming .
- Object Layer – Data in IPFS is organized in a Merkle Directed Acyclic Graph (Merkle DAG) structure. Nodes are called objects and can contain data or links to other objects. Links are cryptographic hashes of target data embedded in the source. These data structures provide many useful properties such as content addressing, data tamper resistance, data deduplication, etc.;
- File Layer – To model a Git-like version control system on top of the Merkle DAG, IPFS defines the following objects:
- blob data block: blob is a variable size data block (no link), representing a data block;
- list: used to organize blobs or other lists in an orderly manner, usually representing a file;
- tree: represents a directory and contains blobs, lists, and other trees;
- commit: a Git-like commit, representing a snapshot in an object’s version history;
- Naming Layer – Since every change to an object changes its hash value, a mapping of hash values is required. IPNS (Inter Planetary Namespace System) assigns each user a mutable namespace and can publish objects to a path signed by the user’s private key to verify the object’s authenticity. Similar to URL.
The display corresponding to IPLD:
IPFS applies the functions of the above modules, integrates it into a containerized application, runs on independent nodes, and is available for everyone to use and access in the form of web services . IPFS allows participants in the network to store, request and transmit verifiable data to each other. But since IPFS is open source, it is free to download and use, and is already used by a large number of teams.
Using IPFS and technology, each node can store data that they consider important; but if there is no simple way to motivate others to join the network or store data, the promotion of IPFS will be difficult to expand. At this time, Filecoin, the incentive layer of IPFS, was born. , securitization.
Filecoin adds incentivized storage to IPFS. IPFS users can directly and reliably store their data on Filecoin, opening the door to numerous applications and implementation scenarios for the network.
Posted by:CoinYuppie，Reprinted with attribution to:https://coinyuppie.com/one-article-to-understand-ipfs-a-new-generation-of-internet-underlying-protocol/
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.