Theoretical Computer Science and You

Plagues and Hash Tables

"Mathematics is the queen of the sciences and Number Theory is the Queen of Mathematics" - Gauss

In this section, I discuss first an interesting application, hash tables, that uses basic number theory and then a more complex application, public key encryption, that uses fairly sophisticated number theory and abstract algebra. Both of these applications are used extensively today. Hash tables are used in computer operating systems(i.e. dos and windows), databases, word processors, spread sheets, games, graphic systems and many other programs. Public Key Encryption is gaining increased popularity with the proliferation of the net both to ensure protection, privacy and to authenticate monetary transactions.

But first, let's discuss plagues. Occasionally the natural world "borrows" ideas from mathematics in a way that shows the profundity and independence of that discipline. One such example is that the dormant period of plagues, locusts and similar parasites is almost always a prime number(e.g. locusts appear every 17 years, there is a 7 year and 13 year plague, etc.). The reason for this first requires some understanding of biology. A parasite has a strong dependence on its host population because if the host dies, so does the parasite. Parasites compete for hosts as most animals compete for food so that in general it is adventageous for two parasite populations to be active at the same time as infrequently as possible. When two populations are active during the same year, we say that they "collide".

Now it is easy to show that for dormant periods of about the same length, the populations collide less frequently when the period is a prime number. Let a be the dormant period of the parasite A and b be the dormant period of parasite B. Let c be the number of years between collision of A and B. Note that period between collisions is always the same since the "state" of the system is exactly the same after each collision. We assume A and B eventually collide(if there are few parasites, they may not but in a large population of parasites, some must eventually collide) and for convenience we call the first time they collide year 0.

Now it is a fundamental theorem of number theory that any number can be factored uniquely into primes(e.g. 18=3*2*2 28=2*7*2) so let a=p1p2...pn and b=p1'p2'...pm' where the p's are all prime numbers.

At year c, both B and A are active hence a divides c and b divides c. Which means c contains those primes which occur in either a or b. If there is no prime which is in both a and b, c=a*b. If there is some shared prime, c is less than a*b. Now it should be easy to see that if a and b are different prime numbers, it is certain that c=a*b and thus c is maximized.

Here are two examples:

a=18 and b=6

a=3*3*2 b=3*2 hence c=3*3*2=18

a=19 and b=3 means c=19*3=57

Although the dormant periods are about the same length, the time between collisions is much larger in the second case since both a and b are prime. The fact that prime numbers lead to infrequent collisions will be used in the application described in the next section.

The Dictionary Problem



A problem that comes up again and again in computer applications is the dictionary problem which can be described intuitively in the following way: Imagine you're keeping a dictionary of slang on the computer. You want to be able add and delete new words quickly and given a word in the dictionary, you want to be able to find its definition quickly. The question is, what is the fastest way of doing this?

First a side note on the way computer theorists measure speed. In the equation giving the speed of an algorithm as a function of the size of the input, all constants, coefficients and lower order terms are dropped. So 45n^2 + 100n becomes n^2. In this sense 1000n+logn (=n) is faster than .01n^2 (=n^2). This is justified because after the input becomes larger than some constant size, 1000n+logn is always faster than .01n^2.

Now a common way to solve the dictionary problem is to keep the words in some sort of tree and in order to find a word, compare it with a succession of words to find where it fits. This is similar to the technique humans use when looking up a word in a dictionary. First they may open the dictionary halfway, see if the word they're looking for is to the right or left of that page and continue in this manner.

It can be proven that any method using comparisons can run no faster than logn time. We should not be too upset by this result but rather use it to explore better ways of solving the problem. Namely we should explore methods which do not use comparisons. Such a method is the hash table which can do inserts, deletes and finds all in time which is constant on average. (In the average case, the hash table beats a tree but in the worst case it may not).

A hash table is an array of lists. It takes a word, converts it to a number and then finds the remainder when that number is divided by the size of the array. The word(and its definition) is then placed in the cell identified by this remainder. What we hope is that most cells in the array will only have 1 or 2 words in them so that the lists can be accessed quickly(through randomized techniques beyond the scope of this text, this can often be guaranteed). In other words we want to minimize collisions.

It should be no surprise to the astute reader that hash tables are most effective when their size is a prime number. This helps to minimize collisions of commonly occuring patterns. For example, maybe all words ending in "s" will be converted to numbers divisible by 10 then we can think of all words ending in "s" as being a plague with a set dormant period. If the hash table were of size 100, many of these words would collide in the cells 1, 10, 20, etc. But if the table is of size 101(a prime), there will be much less crowding e.g. 10 and 110 collide in cell 10 in example one but in example two 10 falls in cell 10 but 110 falls in cell 9.

Public Key Encryption



Public Key Encryption depends on the computational difficulty of factoring large numbers. It depends on several theorems in Number Theory which I present here without proof. The first of these theorems is known as Fermat's Little Theorem which states that