“Number” has its own golden house, and there is endless Alpha hidden in the data on the chain. When we follow the news of smart money, when we search for trending NFTs in NFT Paradise day and night, when we query StepN’s daily new shoe casting data, are you curious about how these data come from? Faced with numerous on-chain data analysis platforms and complex functions, are you still looking for the most suitable platform for you?
1. Background introduction
With the increasingly prosperous ecosystem on the chain, such as DeFi transactions, lending, NFT casting, transactions, etc., user behaviors are directly and transparently recorded on the chain. The data on these on-chain behavior corresponds to the flow of value on the chain, and the analysis of this data and the insights and insights derived from the analysis become extremely valuable. On-chain data analysis platforms, such as Nansen, Token Terminal, Dune Analytics, Footprint Analytics, flipsidecrypto, glassnode, Skew, etc., have responded to these growing demands and launched products with slightly different focuses for individual and institutional users.
This article will briefly describe the data architecture behind the on-chain data analysis platform, in order to tell readers where and how the on-chain data analysis results come from. Then we sort out the mainstream data analysis platforms for individual users in the market from the dimensions of data richness (covering the number of blockchains), data granularity, data delay, platform ease of use, and query freedom. Finally, we share our thoughts on Web3’s future on-chain data indexing, query and analysis.
2. Introduction to the data architecture of the on-chain data analysis platform
Although the blockchain records all the original transaction data, the data on the chain itself is open and transparent, but when we ask: What is the transaction volume of Uniswap in the past 24 hours? What percentage of the current BAYC holders also hold at least one Moonbirds at the same time?… etc., the raw data on the chain cannot give us the answer, we need to index, process ), storage, and a series of data ingestion processes, and then aggregate the corresponding data according to the question to get the answer to the question. It is very time-consuming and labor-intensive to directly query the blockchain to obtain the answer to the question. In order to allow the data on the chain to be quickly retrieved, the current mainstream on-chain data analysis platforms will index the original on-chain data obtained through a series of processing. After that, it is stored in the data warehouse (data warehouse) that is updated and managed by the platform. When users track the transaction dynamics of smart money on Nansen, or view visual analysis on Dune Analytics, the user’s query for the so-called “on-chain data” is actually querying the database centrally controlled by the project party, not the blockchain. itself.
The data warehouse architecture of the on-chain data analysis platform is roughly as follows:
- Data collection layer: The platform obtains the original on-chain data from blockchain nodes, some platforms will accept data sources provided by third parties, and some platforms (such as Footprint Analytics) support users to upload off-chain data to assist in the final data analysis.
- Data processing layer: Each platform extracts, transforms and loads raw data in the form of stream processing or batch processing. In streaming processing, real-time raw data is continuously input and processed continuously, which usually means low data latency and higher timeliness of analysis results; while batch processing has slightly higher data delay and lower timeliness of analysis results, But it is more suitable for large-capacity data processing.
- Data storage layer: The processed data will be stored in each data table of the data set according to the format pre-defined by the platform for subsequent use.
- Data integration layer: The stored data will be aggregated. Calculations can be performed according to pre-set metrics (metrics computation), periodic (periodic) or triggered according to set conditions (event-driven aggregation).
- Data analysis layer: The results of the operation are reported and output in real time. For individual users, we mainly interact with the data analysis platform on the chain at the data analysis layer, such as the Business Intelligence report interface provided by Nansen, the numerous visual charts on Dune Analytics and Footprint Analytics, and the API interfaces provided by some platforms, etc. .
Each platform adopts different schemes to build and maintain its own data warehouse. For example, Nansen uses the third-party Google Cloud Platform to complete the construction and maintenance of the data warehouse.
Image credit: Google Cloud Nansen Case Study https://cloud.google.com/customers/nansen
Platforms such as Dune Analytics, Footprint Analytics, and Token Terminal independently build and maintain their own data warehouses. Taking Footprint Analytics as an example, its data warehouse architecture is shown in the following figure.
3. Comparison of mainstream on-chain data analysis platforms
From the perspective of content and users, this section compares several mainstream on-chain data analysis platforms from the dimensions of data richness (covering the number of blockchains), data granularity, data latency, platform ease of use, and query freedom, including: Nansen, Token Terminal, Dune Analytics, Footprint Analytics.
Some platforms provide users with standardized information reporting interfaces, such as Nansen, Token Terminal, etc.
Nansen should be one of the most familiar on-chain data analysis platforms. Compared to other platforms, its most outstanding feature is wallet labeling (wallet profiler/wallet labeling). With the help of wallet tags and other on-chain data, valuable information is extracted for users, such as Smart Money, to help users track the real-time movements of giant whales and heavy DeFi players. Other popular products include Hot Contract, to discover emerging hot DeFi and NFT contracts; NFT Paradise, to see real-time NFT minting data at a glance, and more.
【Covered blockchain】Nansen now supports on-chain data analysis of 11 blockchains including Ethereum, Arbitrum, Avalanche, BSC, Celo, Fantom, Optimism, Polygon, Ronin, Terra, Solana
【Data granularity】Nansen regular version only provides users with curated data (curated data)
[Data Latency] Streaming and batch processing. Some data analysis has achieved near real-time reporting
[Platform ease of use] Zero threshold
【Query freedom】Nansen regular version only provides standard information template interface. In response to institutional customers’ needs for custom on-chain data query and analysis, Nansen has released the Nansen Institutions product with the help of Google Cloud Platform’s Blockchain Datasets, allowing professional/institutional users to write SQL Queries that meet customized needs.
It is worth mentioning that Nansen has published quite a few on-chain analysis reports in the Nansen Research channel. The research report will conduct on-chain tracking and analysis of key events. Readers may wish to read these reports occasionally (such as the report published by Nansen on the stETH de-anchoring event last month https://www.nansen.ai/research/on-chain -forensics-demystifying-steth-depeg), which is very helpful for learning methods of on-chain analysis.
Token Terminal is known for providing accurate protocol revenue. Based on the protocol revenue, Token Terminal calculated the price-to-sales ratio (P/S) and price-to-earnings (P/E) ratio of each protocol. These data, to a certain extent, provide a valuation benchmark for the various protocols.
[Covering the blockchain] Token Terminal tracks data on over 130 protocols
[Data granularity] Token Terminal only provides users with curated data
【Data Delay】Batch processing. According to the recent communication between the IOSG team and Token Terminal, the data on the Token Terminal platform is currently delayed by about two days.
[Platform ease of use] Zero threshold
【Query Degree of Freedom】Only provide standard information interface
Token Terminal protocol revenue data legend: revenue share of the top ten blockchains and protocols in the past 365 days
Other mainstream on-chain data analysis platforms open data tables to users, and users can freely write codes to query, giving users a certain degree of freedom in query content, such as Dune Analytics and Footprint Analytics.
Dune Analytics is the first on-chain data analysis platform to open users’ self-inquiry, and has the largest analyst group and user community. Dune Analytics provides highly granular raw on-chain data that analysts can freely use to write custom queries. Dune Analytics also opens Abstraction to the project team, and the project party can create a more suitable data table for analysts to use according to the data content of their own agreement. However, there are certain thresholds for independent queries. Analysts need to have the ability to write PostgreSQL to create data queries that meet their own needs. And the query latency is highly correlated with the analyst’s SQL writing level and familiarity with the data tables provided by Dune Analytics.
【Covered blockchain】Dune Analytics provides on-chain data of 6 blockchains including Ethereum, BSC, Optimism, Polygon, Gnosis Chain and Solana
[Data granularity] Very fine
[Data Delay] Stream processing. Data is delayed by about five minutes
[Platform ease of use] Dune Analytics puts forward certain SQL coding requirements for analysts
With highly granular raw data, analysts are free to create on-chain analysis in Dune Analytics. Such as daily StepN new shoe casting and historical accumulation data https://dune.com/queries/627689/1170627
Dune Analytics released Dune Engine v2 on May 30, 2022. Dune Engine v2 significantly changes the data structure of Dune Analytics to provide users with faster query response and better query performance, while minimizing the impact on user experience.
Compared with Nansen, which has a low threshold for use but only provides a standardized information interface, Dune Analytics provides free query but requires analysts to have the ability to write PostgreSQL language. Footprint Analytics provides users with the best of both worlds solutions, while giving great query freedom. while lowering the threshold for use. How does it do it?
“The data on the chain is complex, and analysts may need to write hundreds or thousands of lines of code to complete the calculation of an indicator. In order to solve the problem of high analysis threshold, Footprint cleans and integrates the data on the chain, giving business meaning to the data, so that users do not need to SQL queries and coding can also analyze blockchain data. Anyone can build their own custom charts in minutes through the rich charting interface, decrypt on-chain data, and discover value trends behind projects.”
Footprint Analytics not only provides raw blockchain data, but also grades on-chain data. The most original on-chain data is Bronze data, the filtered, cleaned and enhanced data is Silver data, and the data with business significance is further sorted out as Gold data.
Gold and silver-level data with business logic and business significance that have been collated can be directly used for analysis. With the help of gold and silver level data, Footprint Analytics provides users with a service to independently query the data on the chain by simply dragging and dropping the data table. Regardless of whether you can write SQL-like code, you can quickly create a data analysis information interface that meets your customized needs, and visualize the required information through intuitive and interactive charts.
[Covered blockchain] Footprint Analytics currently provides on-chain data for a total of 17 blockchains including Ethereum, Arbitrum, Avalanche, Boba, BSC, Celo, Fantom, Harmony, IOTEX, Moonbeam, Moonriver, Polygon, Thundercore, Solana, etc.
[Data Granularity] Footprint Analytics provides users with extremely fine-grained raw data as well as curated data.
[Data delay] Currently, Footprint Analytics processes the collected raw data in batches once a day, and the data delay is one day.
【Platform ease of use】On the Footprint Analytics platform, users can freely analyze the data on the chain without SQL query and coding. For analysts with SQL coding capabilities, Footprint also provides raw data for analysts to play with.
Readers may wish to go to Footprint Analytics now, and you can start making your own on-chain analysis interface in a few minutes
Image source: IOSG
4. A little imagination – data analysis on the decentralized chain
On-chain data analysis is so important, but today’s users can only rely on centrally managed “on-chain data” analysis platforms such as Nansen and Dune Analytics to assist investment decisions. On these platforms, users cannot verify that the data used has not been tampered with, and have to trust that the datasets provided by the platform are unequivocal. “Don’t Trust. Verify.” has become an empty phrase in on-chain data analysis.
The wave of Web3 is rolling in, and the ecosystem on the chain is becoming more and more abundant. In the future, smart contracts and decentralized applications may not only require the original data on the chain and the data provided by the oracle as input information, but may also require input based on the original data on the chain. The analysis results obtained, can we still trust and use these centralized data analysis platforms on the chain for such purposes? The answer is probably no.
The IOSG team has recently seen existing project teams take the first steps on the road to decentralize data query and analysis on the chain. Due to the limited space, let’s listen to the next decomposition – the road to decentralized data analysis on the chain.
Posted by:CoinYuppie，Reprinted with attribution to:https://coinyuppie.com/status-and-prospects-of-on-chain-data-analysis-platforms/
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.