« Things to read while the simulator runs; part 8 | Main | How big is a whale? »

November 12, 2009

Power laws and all that jazz, redux

Long time readers will be very familiar with my interest in power-law distributions (for instance, here and here). So, I'm happy (and relieved) to report that my review article, with Cosma Shalizi and Mark Newman, on methods for fitting and validating power-law distributions in empirical data has finally appeared in print over at SIAM Review. Given that this project started back in late 2004 for me, it's very pleasing to see the finished product in print. This calls for a celebration, for sure.

A. Clauset, C. R. Shalizi and M. E. J. Newman. "Power-law distributions in empirical data." SIAM Review 51(4), 661-703 (2009). (Download the code.)

Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution -- the part of the distribution representing large but rare events -- and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov–Smirnov (KS) statistic and likelihood ratios. We evaluate the effectiveness of the approach with tests on synthetic data and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of different disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data, while in others the power law is ruled out.

Here's a brief summary of the 24 data sets we looked at, and our conclusions as to how much statistical support there is in the data for them to follow a power-law distribution:

Good:
frequency of words (Zipf's law)

Moderate:
frequency of bird sightings
size of blackouts
book sales
population of US cities
size of religions
severity of inter-state wars
number of citations
papers authored
protein-interaction degree distribution
severity of terrorist attacks

With an exponential cut-off:
size of forest fires
intensity of solar flares
intensity of earthquakes (Gutenberg-Richter law)
popularity of surnames
number of web hits
number of web links, with cut-off
Internet (AS) degree distribution
number of phone calls
size of email address book
number of species per genus

None:
HTTP session sizes
wealth
metabolite degree distribution

posted November 12, 2009 08:19 AM in Complex Systems | permalink

Comments

Funny that wealth should not be distributed according to a power law, when so much research into this area has been inspired by Pareto's original research :-)

Posted by: Henrik at November 12, 2009 11:05 AM

The story for wealth is slightly more complicated: basically, there's too much structure in the distribution for it to be merely a power-law distribution. It's certainly highly skewed and heavy-tailed, but there's more going on there than a simple power-law hypothesis would lead you to believe.

Posted by: Aaron at November 12, 2009 01:03 PM

Aaron -
Thanks for the paper and the code - I got the pointer from Peter Mucha here at UNC. I used it to test Bank sizes and holdings of credit derivatives (don't follow a power law).

Jesse

Posted by: Jesse Blocher at November 12, 2009 05:39 PM