« Thoughts on NetSci 2007 | Main | Virtual life »

June 08, 2007

Power laws and all that jazz

With apologies to Tolkien:

Three Power Laws for the Physicists, mathematics in thrall,
Four for the biologists, species and all,
Eighteen behavioral, our will carved in stone,
One for the Dark Lord on his dark throne.

In the Land of Science where Power Laws lie,
One Paper to rule them all, One Paper to find them,
One Paper to bring them all and in their moments bind them,
In the Land of Science, where Power Laws lie.

From an interest that grew directly out of my work chracterizing the frequency of severe terrorist attacks, I'm happy to say that the review article I've been working on with Cosma Shalizi and Mark Newman -- on accurately characterizing power-law distributions in empirical data -- is finally finished. The paper covers all aspects of the process, from fitting the distribution to testing the hypothesis that the data is distributed according to a power law, and to make it easy for folks in the community to use the methods we recommend, we've also made our code available.

So, rejoice, rejoice all ye people of Science! Go forth, fit and validate your power laws!

For those still reading, I have a few thoughts about this paper now that it's been released into the wild. First, I naturally hope that people read the paper and find it interesting and useful. I also hope that we as a community start asking ourselves what exactly we mean when we say that such-and-such a quantity is "power-law distributed," and whether our meaning would be better served at times by using less precise terms such as "heavy-tailed" or simply "heterogeneous." For instance, we might simply mean that visually it looks roughly straight on a log-log plot. To which I might reply (a) power-law distributions are not the only thing that can do this, (b) we haven't said what we mean by roughly straight, and (c) we haven't been clear about why we might prefer a priori such a form over alternatives.

The paper goes into the first two points in some detail, so I'll put those aside. The latter point, though, seems like one that's gone un-addressed in the literature for some time now. In some cases, there are probably legitimate reasons to prefer an explanation that assumes large events (and especially those larger than we've observed so far) are distributed according to a power law -- for example, cases where we have some convincing theoretical explanations that match the microscopic details of the system, are reasonably well motivated, and whose predictions have held up under some additional tests. But I don't think most places where power-law distributions have been "observed" have this degree of support for the power-law hypothesis. (In fact, most simply fit a power-law model and assume that it's correct!) We also rarely ask why a system necessarily needs to exhibit a power-law distribution in the first place. That is, would the system behave fundamentally differently, perhaps from a functional perspective, if it instead exhibited a log-normal distribution in the upper tail?

Update 15 June: Cosma also blogs about the paper, making many excellent points about the methods we describe for dealing with data, as well as making several very constructive points about the general affair of power-law research. Well worth the time to read.

posted June 8, 2007 10:00 AM in Complex Systems | permalink

Comments

I think I'm mainly cautious of power-law distributions because I've seen so many papers that have no clue on statistics (for which the authors should be forced to read your paper for all eternity) but mainly because people then go on to say that their "just so" story is correct, when they've given no consideration to any one of a number of processes that could have led to a power-law or power-law-like distribution... which gets back to your final question.

This seems relevant and rather amusing.

Posted by: Matthew Berryman at June 14, 2007 06:11 AM

Another motivation for using a power law is that it does have only two parameters. In the sense that physical scientists often seek minimal characterizations of the data, this form is more attractive than some competing ones such as log normal. It is kind of a pity that the statistics of fitting a power law are so much more complicated than linear regression, which leads to so many bad inferences. Of course, if the approach is simply to find a minimum number of parameters to describe the data, one cannot infer anything about what led to the 'power law' behavior. However, it is still possible to compare related distributions and use the fits to characterize differences. Provided of course, that we have some measure of 'how straight'. Unfortuantely, the negative press about power laws seems to be restricting even this, as a recent reviewer of mine made clear: 'The decreasing power formulas F = AN^(-K) are useless. These empirical approximations have been much criticized in the literature. Any two-parameter decreasing function will fit the data as well...' Thanks very much for your article.

Posted by: Suzanne Kiihne at June 17, 2007 04:15 AM

Suzanne, the simple description of data that a power-law fit gives is certainly an attractive quality of the distribution, and perhaps one reason physicists favor it a priori. On the other hand, an exponential tail has just as few parameters, and is often easier to explain than a power-law tail. So, the question of which one to favor becomes one of explanatory power. A deeper question though, is whether we want simply to approximately describe the data -- that is, to simply give a rough summary of what the data looks like, in which case the power-law fit is one such concise description but certainly not the only one -- or to explain the data -- that is, to establish objective criteria for how good we believe our summary is, and then eliminate alternative descriptions as being inaccurate. The latter is what many of the theorists who fit power laws to data would like to do, but their methods in fact only do the former.

Unfortunately, in many biological systems, there are simply not enough data in the histogram to do the latter, that is, it's very hard to distinguish between alternative fits. This is probably what the reviewer was trying to say, although the apparently reflexive dislike of "two-parameter decreasing function"s sounds more like a dogmatic statement than a scientific one.

Posted by: Aaron at June 17, 2007 10:12 AM