Plagues and Hash Tables
"Mathematics is the queen of the sciences and Number Theory is the
Queen of Mathematics" - Gauss
In this section, I discuss first an interesting application, hash
tables, that uses basic number theory and then a more complex
application, public key encryption, that uses fairly sophisticated
number theory and abstract algebra. Both of these applications are
used extensively today. Hash tables are used in computer operating
systems(i.e. dos and windows), databases, word processors, spread
sheets, games, graphic systems and many other programs. Public Key
Encryption is gaining increased popularity with the proliferation of
the net both to ensure protection, privacy and to authenticate monetary
transactions.
But first, let's discuss plagues. Occasionally the natural world
"borrows" ideas from mathematics in a way that shows the profundity
and independence of that discipline. One such example is that the
dormant period of plagues, locusts and similar parasites is almost
always a prime number(e.g. locusts appear every 17 years, there is a 7
year and 13 year plague, etc.). The reason for this first requires
some understanding of biology. A parasite has a strong dependence on
its host population because if the host dies, so does the parasite.
Parasites compete for hosts as most animals compete for food so that
in general it is adventageous for two parasite populations to be
active at the same time as infrequently as possible. When two
populations are active during the same year, we say that they "collide".
Now it is easy to show that for dormant periods of about the same
length, the populations collide less frequently when the period is a
prime number. Let a be the dormant period of the parasite A and b be
the dormant period of parasite B. Let c be the number of years
between collision of A and B. Note that period between collisions is
always the same since the "state" of the system is exactly the same
after each collision. We assume A and B eventually collide(if there
are few parasites, they may not but in a large population of
parasites, some must eventually collide) and for convenience we call
the first time they collide year 0.
Now it is a fundamental theorem of number theory that any number can
be factored uniquely into primes(e.g. 18=3*2*2 28=2*7*2) so let
a=p1p2...pn and b=p1'p2'...pm' where the p's are all prime numbers.
At year c, both B and A are active hence a divides c and b divides c.
Which means c contains those primes which occur in either a or b. If
there is no prime which is in both a and b, c=a*b. If there is some
shared prime, c is less than a*b. Now it should be easy to see that
if a and b are different prime numbers, it is certain that c=a*b and
thus c is maximized.
Here are two examples:
a=18 and b=6
a=3*3*2 b=3*2 hence c=3*3*2=18
a=19 and b=3 means c=19*3=57
Although the dormant periods are about the same length, the time
between collisions is much larger in the second case since both a and
b are prime. The fact that prime numbers lead to infrequent
collisions will be used in the application described in the next section.
The Dictionary Problem
A problem that comes up again and again in computer applications is
the dictionary problem which can be described intuitively in the
following way: Imagine you're keeping a dictionary of slang on the
computer. You want to be able add and delete new words quickly and
given a word in the dictionary, you want to be able to find its
definition quickly. The question is, what is the fastest way of doing
this?
First a side note on the way computer theorists measure speed. In the
equation giving the speed of an algorithm as a function of the size of
the input, all constants, coefficients and lower order terms are
dropped. So 45n^2 + 100n becomes n^2. In this sense 1000n+logn (=n)
is faster than .01n^2 (=n^2). This is justified because after the
input becomes larger than some constant size, 1000n+logn is always
faster than .01n^2.
Now a common way to solve the dictionary problem is to keep the words
in some sort of tree and in order to find a word, compare it with a
succession of words to find where it fits. This is similar to the
technique humans use when looking up a word in a dictionary. First
they may open the dictionary halfway, see if the word they're looking
for is to the right or left of that page and continue in this manner.
It can be proven that any method using comparisons can run no faster
than logn time. We should not be too upset by this result but rather
use it to explore better ways of solving the problem. Namely we
should explore methods which do not use comparisons. Such a method is
the hash table which can do inserts, deletes and finds all in time
which is constant on average. (In the average case, the hash table
beats a tree but in the worst case it may not).
A hash table is an array of lists. It takes a word, converts it to a
number and then finds the remainder when that number is divided by the
size of the array. The word(and its definition) is then placed in the
cell identified by this remainder. What we hope is that most cells in
the array will only have 1 or 2 words in them so that the lists can be
accessed quickly(through randomized techniques beyond the scope of
this text, this can often be guaranteed). In other words we want to
minimize collisions.
It should be no surprise to the astute reader that hash tables are
most effective when their size is a prime number. This helps to
minimize collisions of commonly occuring patterns. For example, maybe
all words ending in "s" will be converted to numbers divisible by 10
then we can think of all words ending in "s" as being a plague with a
set dormant period. If the hash table were of size 100, many of these
words would collide in the cells 1, 10, 20, etc. But if the table is
of size 101(a prime), there will be much less crowding e.g. 10 and 110
collide in cell 10 in example one but in example two 10 falls in cell
10 but 110 falls in cell 9.
Public Key Encryption
Public Key Encryption depends on the computational difficulty of
factoring large numbers. It depends on several theorems in Number
Theory which I present here without proof. The first of these
theorems is known as Fermat's Little Theorem which states that