About the course


What about this course? This is the second offering of this special topics course. Randomized algorithms are worth studying by themselves: they involve different analysis techniques, tend to promote a different view of the world, and are generally much simpler to design than conventional (deterministic) algorithms. Indeed, you should take this course to see how amazingly effective randomization can be in simplifying your work: no fancy data structures, no complex algorithms here, but very simple, sometimes even simplistic ideas that actually work. (Proving that they work through an analysis is often the challenging part of the problem.)

What is a randomized algorithm? In brief, it is an algorithm that uses an internal random number generator to help it make decisions. In a way, the algorithm uses coin tosses instead of solving difficult problems. The technique greatly simplifies algorithm design, produces algorithms that are easy to implement with low overhead, and has the added advantage of making these algorithms data-insensitive, in the sense that their behavior depends more on the internal sequence of random values than on the external sequence of input values, so that these algorithms cannot consistently be fooled or made to display poor behavior.

A simple example is a trivial modification to the well-known quicksort algorithm; as you know, standard quicksort is highly data-sensitive -- it can take quadratic time on certain data inputs, although it generally runs very fast. The problem lies in choosing a good threshold value for the partitioning phase -- ideally, it should be the median. The standard solution in practice has been to pick a small sample of values (typically just three) and use the median of these values, hoping that it is close to the overall median. The paper-and-pencil, asymptotic solution is to run the linear-time median-finding algorithm to find the true median and use it (utterly impractical). The randomized solution is to pick one list element at random to be the partitioning element. The result is that we can prove that, on any data input whatsoever, the randomized version will terminate in n log n steps with very high probability.

This strategy (when the choice is crucial, yet hard, choose at random) is surprisingly effective. The resulting algorithms have all the hard work taken out, almost no overhead, and do great in practice.

Am I prepared to take this course? You need to have had a standard undergraduate algorithms course (equivalent to our CS 461). It helps to have had reasonable exposure to discrete probability, although all of the required math will be introduced in the course. Your appreciation of the benefits of randomization will be directly proportional to the number of advanced algorithms courses you have taken, since we will cover algorithms in graphs, geometry, numerical computing, string matching and computational biology, and other areas.

How will the course be run? This is a fairly traditional, lecture-oriented class. No programming required, although I would encourage everyone (not just in the course!) to test their random number generators -- it can be quite a learning experience! I never give in-class tests in graduate courses. In most of them, I don't even really give tests, but rather a collection of graded homework assignments. With around 8 assignments per semester, I have no problem if you need to drop one or two of them for health-, work-, or family-related reasons. Since you have 7-10 days to complete each assignment, you can turn each into a learning experience; I grade each of your assignments and put on the Web a detailed solution sheet.

The goal is to become comfortable with the notion of randomization, perhaps the single most important development in computing over the last 15-20 years. (I am not exaggerating: for instance, we would not have RSA encryption without randomized algorithms; we would not have reasonably efficient Internet routing; we could not do any data mining; we would still have to back up systems by dumping the entire contents of the disks over the network; etc.) That takes exposure and practice. I expect you to be able to design or rederive fairly simple randomized algorithms by the end of the semester and to be able to understand, analyze, and appreciate more complex ones. Incidentally, the tail estimates that form the principal analysis tool for randomized algorithms find uses almost everywhere -- any form of statistical analysis can be improved by using them.