Distributed storage was first proposed by Google, with the aim of providing access to the Web through inexpensive servers for use with large-scale, highly concurrent scenarios.
What is a distributed storage system?
Distributed storage is also known as decentralized storage. To understand it, first we need to understand centralized storage. Centralized storage, conceptually, is centralized, that is, the entire storage is centralized in one system, but centralized storage is not a single device, but multiple devices in a centralized system. This storage device can be vertically scalable, but still share the same head, the device itself will become the bottleneck of system performance, but also the focus of system reliability and security, can not meet the needs of large-scale storage applications.
In this storage system contains many components, in addition to the core head (controller), disk arrays (JBOD) and switches and other devices, there are management devices and other auxiliary devices.
The structure contains a head, which is the most central component of the storage system. Usually there are two controllers in the header, which are used as a backup to avoid hardware failure that could lead to the unavailability of the entire storage system. The front-end port is used to provide storage services to the server, while the back-end port is used to expand the capacity of the storage system. Through the back-end port the head can connect more storage devices, thus forming a very large pool of storage resources.
In the entire architecture, the head is the core component of the entire storage system, where the advanced functions of the entire storage system are implemented. The software in the controller realizes the management of disks, abstracts them into a pool of storage resources, and then divides them into LUNs to be provided to the server for use. The LUNs here are actually the disks seen on the server. Of course, some centralized storage itself is also a file server and can provide shared file services. In any case, from the above we can see that the biggest feature of centralized storage is that there is a unified entry point through which all data must pass, and this entry point is the head of the storage system. This is the most significant feature that distinguishes centralized storage from distributed storage.
Distributed storage system is to store data decentralized on multiple independent devices and can be expanded horizontally, using multiple storage devices to share the storage load and using metadata servers to locate the stored information, it not only improves the reliability, availability and access efficiency of the system, but also is easy to expand. At the same time, these scattered storage devices can be built into a virtual pool of large storage for upper-level applications to use. Many clustered storage, parallel storage, cloud storage, etc. in the market are actually based on distributed architecture, just called differently by different vendors. In recent years, this kind of distributed architecture storage system is gradually replacing the traditional storage architecture, especially in the field of unstructured data storage is developing very rapidly.
Distributed storage was first proposed by Google, and its purpose is to provide use with large-scale, highly concurrent scenarios of Web access problems through inexpensive servers. It uses a scalable system architecture that uses multiple storage servers to share the storage load and location servers to locate stored information, it not only improves the reliability, availability and access efficiency of the system, but is also easy to scale.
Distributed storage system features
1, large capacity: system nodes can be used as a common X86 architecture storage server as a building block, which can expand storage nodes horizontally and infinitely according to user needs, and form a unified shared storage pool.
2, high performance: compared with traditional storage, distributed storage system can provide several times higher aggregated IOPS and throughput, in addition to linear growth with the expansion of storage nodes, dedicated metadata module can provide very fast and accurate data retrieval and positioning to meet the needs of front-end business rapid response.
3, more reliable: the entire system without any single point of failure, data security and business continuity can be guaranteed. Each node can be regarded as a hard disk, and there is a special data protection policy between node devices, which can realize the equipment-level redundancy of the system and replace the damaged hard disk or node devices online.
4、Easy to expand: The system can support online seamless dynamic horizontal expansion. With the redundancy policy, the online and offline of any storage node has no impact on the front-end business and is completely transparent. And the system can choose automatic load balancing after expanding new storage nodes, and all data pressure will be evenly distributed on each storage node.
5, easy to integrate: compatible with any brand of X86 architecture universal storage server, in the standard IP / IB network environment can be easily implemented, without changing the original network architecture.
6、Easy to manage: The whole system can be configured and managed through a simple Web interface, easy to operate and maintain, very low management costs, one administrator can easily manage a petabyte-level storage system.
Posted by:CoinYuppie，Reprinted with attribution to:https://coinyuppie.com/what-is-distributed-storage-from-a-hardware-perspective/
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.