« Friends for the win! | Main | Oops, I tweeted again »

March 15, 2012


Today is a milestone. About a year ago, I blogged about the meteoric rate that my paper with Cosma Shalizi and Mark Newman, on power-law distributions in empirical data, was collecting citations. On that day, the paper had just crossed 500 Google Scholar citations and I used that milestone as an excuse to ask when it might cross the mind-boggling 1000 citations. [1] Since we know a thing or two about citation counts, I decide to apply a little model-based statistical forecasting to come up with a principled guess.

This produced a probability distribution of answers, with the modal crossing date among all the bootstrap models being 21 April 2012 (the 90% bootstrap confidence intervals were 11 Jan. 2012 to 29 Nov. 2013, but the bulk of the distribution is centered on Spring 2012). And, to my surprise and great amusement, 15 March 2012 was the actual crossing date, only a month off from the prediction. Here's what the forecasts from a year ago looked like, along with the actual citation data overlaid. I've also marked where on the forecast distribution the actual prediction landed.

In the new citation data, we again see strange drops and jumps in the citation count. These are presumably from the Google Scholar team tinkering with their algorithms. In fact, the crossing today was caused by the sudden appearance of 44 new citations in the past 5 days, which is high above the normal accumulation rate. But, this may have been a change to the algorithm that restored the citations misplaced in the large drop that occurred in late 2011, it seems reasonable to treat this as a real event. Either way, the closeness of the true crossing data to the forecasted one is a little eerie.

So, there you have it. A milestone. Huzzah. Perhaps I'll buy a mug to commemorate the event.


[1] It is worth saying that the popularity of this paper has been both pleasantly surprising, and gratifying, and I am immensely grateful for what great collaborators Cosma and Mark were on the paper.

posted March 15, 2012 05:17 PM in Self Referential | permalink


You should definitely buy a mug to commemorate the event (not that I'm biased). :)

I noticed a citation jump as well (and also noticed a return of some but not all of my missing citations), though with my smaller numbers I think the biggest citation jump I had in any paper was 10.

Thanks for linking to my Science piece with respect to the reaction to your paper, though given that we have discussed such things before, there are certain correlations when I cite your work that aren't there for a random citation.

Posted by: Mason Porter at March 15, 2012 11:13 PM

Nice job! It's great you've had so many citations, but it's even better to see the method work so well.

Posted by: Abraham Flaxman at March 19, 2012 03:14 PM

I have been reading at Clauset et al. and Stumpf & Porter and am finding them most helpful in sorting out what these power laws are (and aren't). I left the practice of science -- geomorphology -- 25 years ago to follow a different path. Recently I have begun reading in neurobiology and "power laws" are everywhere! Well, at first this was comforting because if there is one thing geomorphologists see a lot it is straight lines on log-log paper. But I think I have come to see that a power function is different from a power law distribution. Some of the things my geomorphology professors were talking about may actually be power laws, e.g., allometric-like growth of drainage channel networks; other things probably weren't, e.g., how velocity, depth and width change with discharge in a river.
Anyway, this long preface is to ask if anyone here would be so kind as to point me at a writing that expands on the following sentence from Stumpf & Porter: "A subtlety to note is that this list includes two different types of reported power laws: bivariate power laws ... and power-law probability distributions ...."

Posted by: Paul Hirsch at March 28, 2012 09:31 AM

Power-law functions like allometry are very different from power-law distributions. The former is a relationship between two things you measure, like the strength of gravity as a function of distance. The latter is a probability distribution, like how often you see something with a given size. In the jargon, the former are often "bivariate" power laws, meaning two variables, while the latter are "univariate" power laws, meaning one variable.

My paper with Cosma and Mark is a good starting place to learn about the distributions. This paper is the one I recommend for learning about the functions.

Posted by: Aaron at March 28, 2012 09:20 PM