Vitalik: Against excessive use of Gini coefficient

The Gini coefficient (also known as the Gini index) is by far the most popular and widely known standard for measuring income inequality, especially for measuring income or wealth inequality in a certain country, region or other community. It is popular because it is easy to understand and its mathematical definition can be easily visualized through a chart.

However, one can imagine that any scheme that attempts to reduce inequality to a single number will have limitations, and so does the Gini coefficient. Even in the context where it was first used to measure income and wealth inequality in countries, it has limitations, and when the Gini coefficient is misappropriated to other contexts (especially the cryptocurrency world), its limitations are It’s more obvious. In this article, I will talk about the limitations of the Gini coefficient and propose other alternatives.

What is the Gini coefficient?

The Gini coefficient was proposed by Corrado Gini in 1912 to measure inequality. It is commonly used to measure the inequality of income and wealth in a country, although it is increasingly being used in other contexts.

The Gini coefficient has two equivalent definitions:

Define with the area on the curve: draw a function graph, where f(p) is equal to the total income earned by the low-income group (ie, f(0.1) represents the share of the lowest 10% of the income in the total income). The Gini coefficient is the area between the curve and the line y=x, which is part of the entire triangle:

Vitalik: Against excessive use of Gini coefficient

Defined by the average difference: the Gini coefficient is half of the average income difference between all possible two persons and divided by the average income.

For example, in the above example, the income of four people is [1, 2, 4, 8], so there are 16 possible differences. They are [0, 1, 3, 7, 1, 0, 2, 6 , 3, 2, 0, 4, 7, 6, 4, 0]. This results in an average difference of 2.875 and an average income of 3.75, so the Gini coefficient = 2.8752/ (2*3.75) ≈0.3833.

The result is that the two values ​​are equal (proving this is an exercise for the reader)!

What’s wrong with the Gini coefficient?

The Gini coefficient is attractive because it is quite simple and easy to understand data. It may not seem simple, but believe me, almost all statistics dealing with populations of any size are so bad, and often worse. Take a look at a formula as basic as standard deviation:

Vitalik: Against excessive use of Gini coefficient

The Gini coefficient is:

Vitalik: Against excessive use of Gini coefficient

It’s really simple, I promise!

So, what’s the problem with it? It actually has a lot of problems, and people have written a lot of articles on various issues about the Gini coefficient. In this article, I will focus on an issue that I think is not discussed in the entire Gini coefficient field, but it is particularly relevant to the analysis of inequality in the Internet community (such as blockchain). The Gini coefficient combines two actually very different issues—suffering due to lack of resources and concentration of power—into one inequality index.

In order to understand the difference between the two issues more clearly, let’s take a look at two dystopias:

  • Dystopia A: Half of the population divides all resources equally, and no one else can divide it at all
  • Dystopia B: One person owns half of all resources, and the others share the remaining half of the resources equally

Here are the Lorentz curves of two dystopias (like the beautiful graphs we saw above):

Vitalik: Against excessive use of Gini coefficient

Obviously, neither of these dystopias are suitable places to live. But they are not suitable for life for different reasons. Dystopia A is equal to giving every resident a chance to toss a coin. If it falls on the left, it will face the terrible massive hunger; if it falls on the right, it will bring harmony with egalitarianism. If you are Thanos, you might like it! If you are not, you should do your best to prevent it from happening. On the other hand, Dystopia B is similar to “Brave New World”: everyone has a decent and good life (at least when taking a snapshot of everyone’s resources), but this is based on an extremely undemocratic power structure. At the price, you better hope you have a good ruler. If you are Curtis Yavin (translator’s note: American far-right blogger), you might like it. If you are not, you should do your best to prevent it from happening.

These two issues are far from each other and deserve to be analyzed and measured separately. This difference is not just theoretical. The following chart shows the proportion of the income of the bottom 20% of the total income (this is a suitable indicator to avoid dystopia A) and the proportion of the income of the top 1% of the total income (which is close to anti- A suitable indicator of Utopia B) Comparison:

Vitalik: Against excessive use of Gini coefficient

Source: (combined 2015 and 2016 data) and

The two are obviously correlated (the correlation coefficient is -0.62), but it is far from a strong correlation (statistical authorities obviously believe that 0.7 is the lower threshold for “highly correlated”, and the value we get is lower than it). There is an interesting second dimension in the chart that can be analyzed-the top 1% earn 20% of the total income and the bottom 20% earn 3% of the country and the top 1% earn 20% of the total income and the bottom What is the difference between countries where 20% of people earn 7% of total income? Alas, this kind of exploration is best left to other data and cultural explorers who are more experienced and enterprising than me.

Why the use of Gini coefficient in non-geographical communities (such as the Internet or crypto communities) is very problematic 

In the blockchain world, wealth concentration is a particularly important issue, and it is a problem worth measuring and understanding. This is important to the entire blockchain world, because many people (and the US Senate hearing) are trying to figure out to what extent cryptocurrency is truly anti-elitism, and to what extent it simply replaces the old elite with new elites. This is also very important when comparing different cryptocurrencies.

Vitalik: Against excessive use of Gini coefficient

In the initial supply of cryptocurrency, it is unequal that part of the tokens are directly distributed to specific insiders. Please note that Ethereum’s data is slightly problematic: the proportions of insiders and foundations should be 12.3% and 4.2%, instead of 15% and 5%.

Considering the concerns about these issues, many people have tried to calculate the Gini index of cryptocurrencies, which should not be surprising at all:

  • The Gini Index of Staking EOS Tokens of Interest (2018)
  • Gini coefficient of cryptocurrency (2018)
  • Use multiple indicators and granularities to measure the degree of decentralization of Bitcoin and Ethereum (2021, including Gini coefficient and two other indicators)
  • Nouriel Roubini compares Bitcoin’s Gini coefficient with North Korea (2018)
  • In-depth observation on the chain of the cryptocurrency market (2021, using Gini coefficient to measure the degree of centralization)

And even earlier than this, we have to deal with this sensational article since 2014:

Vitalik: Against excessive use of Gini coefficient

This type of analysis often makes general methodological errors (usually confuse income with wealth, or confuse users with accounts), but they also have a serious and subtle problem in using the Gini coefficient to make these types of comparisons. The problem lies in the key difference between a typical geographic community (e.g. city, country) and a typical Internet community (e.g. blockchain):

The typical residents of a geographic community spend most of their time and resources in this community, so the inequality measured in a geographic community reflects the inequality of the total resources available to people. However, in the Internet community, inequality can be measured from two sources: (i) the unequal share of total resources obtained by different participants, and (ii) the difference in the level of interest in participating communities.

Ordinary people with 15 US dollars of legal currency are poor and they do not have the ability to get a better life. The average person with $15 worth of cryptocurrency is an amateur, and he opens his wallet for fun. Varying interest levels is healthy; every community has its hobbyists and full-time hardcore fans without life. Therefore, if a cryptocurrency has a very high Gini coefficient, but a large part of the inequality is due to the uneven level of interest, then the reality that this number points to is far less terrible than the headlines.

Cryptocurrencies, even those that are already highly controlled by chaebols, will not turn any part of the world close to dystopia A. But a badly distributed cryptocurrency may look like dystopia B. If token voting governance is used to make protocol decisions, the problem will become more complicated. Therefore, in order to find the most worrying issues in the crypto community, we want a more specific indicator that reflects close to Dystopia B.

Alternative indicators: measuring dystopia A and dystopia B separately

Another way to measure inequality is to estimate the suffering caused by the unequal distribution of resources (ie, the “dystopian A” problem). First, starting from a utility function, it represents the value of a certain amount of money. Many people use log(x) because it can very intuitively represent the approximate value of a person’s doubling of income, and it is valid at any level: the increased benefit from 10,000 USD to 20,000 USD and the increase from 5,000 USD to 10,000 USD, or from 40,000 USD to 80,000 USD is the same. Then, what comes out is how much utility was lost compared to if everyone could only get an average income:

Vitalik: Against excessive use of Gini coefficient

The first term (the logarithm of the average) is the utility that everyone will get if the currency is perfectly distributed, so everyone will earn an average income. The second term (the average of the logarithms) is the average utility of today’s economies. If you think of resources narrowly as things used for personal consumption, the difference between the two represents the loss of utility caused by inequality. There are other ways to define this formula, but they are all close to the equivalent in the end (for example, Anthony Atkinson’s 1969 paper proposed a “fairly distributed equilibrium income level” indicator, which is in U(x) =log(x), it is just a monotonic function in the above formula, and the Theil exponent L is mathematically equal to the above formula).

To measure the resource concentration problem (or “dystopian B” problem), the Herfindahl-Hirschman Index (HHI) is a good starting point. It has been used to measure the degree of economic concentration in the industry. :

Vitalik: Against excessive use of Gini coefficient

For readers who like to learn through visualization, see the following picture:

Vitalik: Against excessive use of Gini coefficient

HHI: Green area divided by total area

It also has other alternative indicators; the Dell index T (Theil T index) has some similarities with it, but there are also differences. A simpler and stupid alternative indicator is the Nakamoto coefficient: the minimum number of participants who need to add up is greater than 50% of the total. Note that all these centralized indicators are very focused on what happens near the top (and deliberately): a large number of amateurs with a few resources contribute little or no contribution to this index, while the top two participants The combined behavior can have a very large impact on this indicator.

For the crypto community, resource concentration is one of the biggest risks to the system, but people with only 0.00013 tokens cannot prove that they are starving, but the idea of ​​adopting these indexes is that way. However, even for the state, the two things of concentration of power and suffering due to lack of resources should be discussed and measured separately.

In other words, to some extent we must exceed these indicators. The harm caused by the concentration problem is not just a function of the number of actors; it also depends to a large extent on the actors and their ability to collude with each other. Similarly, resource allocation is network-dependent: if people who lack resources have an informal network to access, then lack of formal resources is not that harmful. But dealing with these problems is much more difficult, so when we still have less data to use, we do need simpler tools.


Posted by:CoinYuppie,Reprinted with attribution to:
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.

Like (0)
Donate Buy me a coffee Buy me a coffee
Previous 2021-08-04 13:08
Next 2021-08-04 13:16

Related articles