In it Previous article, we focus on two types of distributions: the Gaussian distribution and the power law distribution. We saw that these distributions had diametrically opposite statistical properties. Namely, Power laws are driven by rare events, while Gaussians are not..
This rare event-driven property raised 3 problems with many of our favorite statistical tools (e.g., mean, standard deviation, regression, etc.) when analyzing power laws. The conclusion was that if the data is of Gaussian type, common approaches such as regression can be used and expected values calculated without worries. However, if the data is more Similar power law, These techniques can give incorrect and misleading results..
We also saw a third (naughtier) distribution that could look like both a Gaussian law and a power law (despite their opposite properties) called Register Normal Distribution.
This ambiguity presents challenges for professionals when deciding the better way to analyze a given set of data. To help overcome these challenges, it may be advantageous to determine whether the data fit a power law, log normal, or some other type of distribution.
A popular way to adapt a power law to real-world data is what I will call the “Log-Log approach” (1). The idea comes from taking the logarithm of the power law probability density function (PDF)as derived below.
The above derivation translates the PDF definition of the Power Law into a linear equation, as shown in the figure below.
This implies that the The histogram of data that follows a power law will follow a straight line.. In practice, this looks like generating a histogram for some data and plotting it on a log-log graph (1). One could go even further and perform a linear regression to estimate the α value of the distribution (here, α = -m+1).
However, this approach has important limitations. These are described in reference (1) and summarized below.
- Slope estimates (hence α) are subject to systematic errors
- Regression errors can be difficult to estimate
- The fit may look good even if the distribution does not follow a power law
- The fits may not obey the basic conditions for probability distributions, for example, normalization.
While the Log-Log approach is simple to implement, its limitations make it less than optimal. Instead, we can resort to a more mathematically sound approach through Maximum likelihooda widely used statistic method to infer the better parameters for a model given some data.
Maximum Probability consists of 2 key steps. Step 1: obtain a likelihood function. Step 2: Maximizes the probability with respect to your model parameters.
Step 1: Write the probability function
Probability It is a special type of probability. In a nutshell, It quantifies the probability of our data given a particular model.. We can express it as the joint probability of all our observed data (3). In the case of a Pareto distribution, we can write it as follows.
To make maximizing the probability a little easier, it is customary to work with the log likelihood (they are maximized by the same value of α).
Step 2: Maximize the probability
With a log likelihood function in hand, we can now frame the task of determining the best choice of parameters as an optimization problem. To find the optimal α value based on our data, this boils down to setting the derivative of the) with respect to α equal to zero and then solving for α. A derivation of this is provided below.
This gives us the call Maximum likelihood estimator for α. With this, we can substitute the observed values of x to estimate the value α of a Pareto distribution.
With the theoretical foundation established, let’s see what this looks like when applied to real-world data (from my social media accounts).
One area where fat-tailed data is prevalent is social media. For example, a small percentage of creators get the most attention, a minority of Medium blogs get the most reads, etc.
Here we will use the power law Python library to determine if data from my various social media channels (i.e. Medium, YouTube, LinkedIn) really follows a power law distribution. The data and code for these examples are available at the GitHub repository.
Artificial data
Before applying the Maximum Likelihood-based approach to messy real-world data, let’s see what happens when we apply this technique to artificial data (really) generated from Pareto and Log Normal distributions, respectively. This will help inform our expectations before using the approach on data where we do not know the “true” underlying distribution class.
First, we import some useful libraries.
import numpy as np
import matplotlib.pyplot as plt
import powerlaw
import pandas as pdnp.random.seed(0)
Next, let’s generate data from Pareto and Log Normal distributions.
# power law data
a = 2
x_min = 1
n = 1_000
x = np.linspace(0, n, n+1)
s_pareto = (np.random.pareto(a, len(x)) + 1) * x_min# log normal data
m = 10
s = 1
s_lognormal = np.random.lognormal(m, s, len(x)) * s * np.sqrt(2*np.pi)
To get an idea of what this data looks like, it is helpful to plot histograms. Here, I plot a histogram of the raw values for each sample and the log of the raw values. This latter distribution makes it easier to visually distinguish between power law data and lognormal data.
As we can see from the histograms above, the distributions of raw values appear qualitatively similar for both distributions. However, we can see a marked difference in log distributions. That is, the log-normal distribution is highly skewed and not mean-centered, while the log-normal distribution is reminiscent of a Gaussian distribution.
Now we can use the power law library to fit a power law to each sample and estimate the values for α and x_min. This is what it looks like in our Power Law example.
# fit power to power law data
results = powerlaw.Fit(s_pareto)# printing results
print("alpha = " + str(results.power_law.alpha)) # note: powerlaw lib's alpha definition is different than standard i.e. a_powerlawlib = a_standard + 1
print("x_min = " + str(results.power_law.xmin))
print('p = ' + str(compute_power_law_p_val(results)))
# Calculating best minimal value for power law fit
# alpha = 2.9331912195958676
# x_min = 1.2703447024073973
# p = 0.999
The fit does a decent job of estimating the true values of the parameters (i.e. a=3, x_min=1), as seen in the alpha and x_min values printed above. The p-value above quantifies the quality of the fit. A higher p means a better fit (more on this value in section 4.1 of ref (1)).
We can do something similar for the Log Normal distribution.
# fit power to log normal data
results = powerlaw.Fit(s_lognormal)
print("alpha = " + str(results.power_law.alpha)) # note: powerlaw lib's alpha definition is different than standard i.e. a_powerlawlib = a_standard + 1
print("x_min = " + str(results.power_law.xmin))
print('p = ' + str(compute_power_law_p_val(results)))# Calculating best minimal value for power law fit
# alpha = 2.5508694755027337
# x_min = 76574.4701482522
# p = 0.999
We can see that the Log Normal sample also fits well to a Power Law distribution (p=0.999). Notice, however, that the x_min value is very far at the end. While this may be useful for some use cases, it doesn’t tell us much about the distribution that best fits all of the data in the sample.
To overcome this, we can manually set the x_min value to the sample minimum and redo the adjustment.
# fixing xmin so that fit must include all data
results = powerlaw.Fit(s_lognormal, xmin=np.min(s_lognormal))
print("alpha = " + str(results.power_law.alpha))
print("x_min = " + str(results.power_law.xmin))# alpha = 1.3087955873576855
# x_min = 2201.318351239509
The .Fit() method also automatically generates estimates for a Log Normal distribution.
print("mu = " + str(results.lognormal.mu))
print("sigma = " + str(results.lognormal.sigma))# mu = 10.933481999687547
# sigma = 0.9834599169175509
The estimated values of the Log Normal parameters are close to the actual values (mu=10, sigma=1), so the fit did a good job once again.
However, by correcting x_min, we lost our quality metric p (for some reason the method does not output values when x_min is supplied). So this begs the question, What distribution parameters should I follow? The power law or the normal record?
To answer this question, we can compare the Power Law fit to other candidate distributions using Log likelihood ratios (R). A positive R implies that the Power Law fits better, while a negative R implies that the alternative distribution is better. Furthermore, each comparison gives us a significance value (p). This is demonstrated in the following code block.
distribution_list = ('lognormal', 'exponential', 'truncated_power_law', \
'stretched_exponential', 'lognormal_positive')for distribution in distribution_list:
R, p = results.distribution_compare('power_law', distribution)
print("power law vs " + distribution +
": R = " + str(np.round(R,3)) +
", p = " + str(np.round(p,3)))
# power law vs lognormal: R = -776.987, p = 0.0
# power law vs exponential: R = -737.24, p = 0.0
# power law vs truncated_power_law: R = -419.958, p = 0.0
# power law vs stretched_exponential: R = -737.289, p = 0.0
# power law vs lognormal_positive: R = -776.987, p = 0.0
As shown above, any alternative distribution to the power law is preferred when including all data in the Log Normal sample. Furthermore, based on likelihood ratios, the lognormal and lognormal_positive fits perform best.
Real world data
Now that we have applied the power law data library where we know the fundamental truth, let’s try it with data whose underlying distribution is unknown.
We will follow a similar procedure to the previous one but with real-world data. Here we will analyze the following data. Monthly followers gained on me Half profile, earnings on all my Youtube daily videos and impressions in my LinkedIn publications from last year.
We will start by plotting histograms.
Two things jump out to me about these plots. Oneall three look more like Log Normal histograms than the Power Law histograms we saw before. TwoMedium and YouTube distributions are sparse, meaning they may not have enough data to draw solid conclusions.
Next, we will apply the power law fit to all three distributions and set x_min as the smallest value in each sample. The results of this are printed below.
To determine which distribution is better, we can again make direct comparisons of the Power Law fit to some alternatives. These results are given below.
Using the general rule of the significance limit of p<0.1, we can draw the following conclusions. Average LinkedIn followers and impressions best fit a Log Normal distribution, while a Power Law best represents YouTube earnings.
Of course, since the Medium followers and YouTube pending data here is limited (N<100), we should take any conclusions from that data with a grain of salt.
Many standard statistical tools fail when applied to data that follow a power law distribution. Consequently, detecting power laws in empirical data can help practitioners avoid incorrect analyzes and misleading conclusions.
However, Power Laws are an extreme case of the more general phenomenon of fat tails. In the next article in this series, we will take this work a step further and quantify the fat tail for any given data set using 4 heuristic practices.