my $xrange = 100000;
my $sampleCount = 10000;
my $total = 0;
for my $size (1..$xrange) {
$total += 1 / ($size
* $size);
}
for my $i (1..$sampleCount) {
my $sample = rand()
* $total;
for my $size (1..$xrange)
{
if ($sample < (1 / ($size * $size))) {
print "$size\n";
last;
}
$sample -= 1 / ($size * $size);
}
}
This says that P[size] = size^(-2), and gives us the following histogram. The x-axis is size, and the y-axis is P[S = size].
To generate Lognormal data, I used the following code:
for my $i (1..10000) {
print Lognormal(), "\n";
}
sub Lognormal {
my $seed = 1;
for my $i (1..100) {
my $multiplier = exp(rand() - 0.5);
$seed *= $multiplier;
}
return $seed;
}
Which produces a regular histogram that looks strikingly similar to the previous one. Again, the x-axis is size, and the y-axis is P[S = size].
The histograms start to look different when we histogram out the logs of the data instead, and let all else be the same. We then get the following histogram for the Power Law data (x-axis is log(size), y-axis is P[S = log(size)]).
The Lognormal distribution, on the other hand, comes into its own (x-axis
is log(size), y-axis is P[S = log(size)]).
Now let's make the y-axis cumulative for both plots. For the Power Law data, we now get this (x-axis is log(size), y-axis is P[S > log(size)]).
And in the Lognormal case, we get this (x-axis is log(size), y-axis is P[S > log(size)].
Setting the y-axis to be a log scale gives us the classic straight line of a power law for the Power Law data (x-axis is log(size), y-axis is P[S > log(size)], but log-scaled now).
If you do the same for the Lognormal data, you get something a bit different:
However, it's worth noting that if you cut off the data at x = 0 and squinted a bit, you could be dishonest and make it look like this data is actually a power law:
I get the feeling that people sometimes do this in their papers,
and thereby misread the model.