CS 59106 "Algorithms in the Real World" Fall 2002

Instructor:
 Jared Saia
 Office: FEC 337, phone: 2773149
 Office Hours: Tuesdays and Thursdays 3:154:15, Weds 23pm (or by appointment)
Meeting Times:
 Tuesdays and Thursdays 2:003:15pm in Mechanical Engineering, Room #208
cs59106 Email List:
 Directions for subscribing to the class email list are here. The
above link also contains an archive of the mailing list.
Assignments:

Homework 1

Homework 2,
The solution to the last problem is given in the manuscript "Tossing a
Biased Coin" by Michael Mitzenmacher which is here.
Reading:
week 12: "On Algorithms for Efficient Data Migration" by Joe Hall,
Jason Hartline, Anna Karlin, Jared Saia and John Wilkes.
Symposium on Discrete Algorithms 2001. ( ps ,
pdf).
week 35: "Censorship Resistant PeerToPeer Content Addressable
Networks" by Amos Fiat and Jared Saia. Journal of Algorithms
2002 ( ps ,
pdf).
Slides from Lectures:
Syllabus
Course Description:
In the past several years, the algorithms research community has had
several major breakthroughs in designing algorithms which have both
strong theoretical and strong empirical properties. For example, the
very successful companies Google and Akamai were both built on
provably good algorithms first described at major research conferences
(WWW '96 and STOC '97 respectively). Additionally, the company Celera
attributes much of its success in genome sequencing to its algorithm
engineers.
In this course, we will study several algorithms which have been
successful in the real world. We will learn some new algorithmic
tools that have proven successful for real world problems including:
ways to create approximation algorithms for NPHard problems, ways to
exploit the power of randomness, and ways to create tractable abstract
problems from messy realworld problems. We will also study some
important open algorithmic problems whose solutions would have a big
"realworld" impact.
Texts:
There is no required text but the following texts are recommended for
reference: "Approximation Algorithms for NPHard Problems" Edited by
Dorit S. Hochbaum and "Randomized Algorithms" by Raghavan and Motwani.
Topics
Topics will include:

Advanced graph theory

Hall's Theorem and Data Migration.
"On Algorithms for Efficient Data Migration" by Joe Hall, Jason
Hartline, Anna Karlin, Jared Saia and John Wilkes. Symposium on
Discrete Algorithms 2001. (
ps , pdf).

"An Experimental Study of Data Migration Algorithms" by Eric
Anderson, Joe Hall, Jason Hartline, Michael Hobbes, Anna Karlin, Jared
Saia, Ram Swaminathan and John Wilkes. Workshop on Algorithm
Engineering 2001. ( ps , pdf).

Expander graphs and AttackResistant P2P Networks.
"Censorship Resistant PeerToPeer Content Addressable Networks" by
Amos Fiat and Jared Saia. Journal of Algorithms 2002 ( ps , pdf).

Szemeredi's Regularity Lemma ???

Randomized Algorithms

Random Construction of Expander Graphs
"Censorship Resistant PeerToPeer Content Addressable Networks" by
Amos Fiat and Jared Saia. Journal of Algorithms 2002 ( ps , pdf).

Web Caching with Consistent Hashing (the Akamai problem)
"Consistent Hashing and Random Trees: Tools for Relieving Hot Spots on
the World Wide Web" by David Karger, Eric Lehman, Tom Leighton,
Matthew Levine, Daniel Lewin and Rina Panigrahy. STOC 1997. ( ps ). See
also the simpler paper in WWW8 located here

Pattern Matching and Fingerprinting ???
"Some applications of Rabin's fingerprinting method" by Andrei Z. Broder
in Renato Capocelli, Alfredo De Santis, and Ugo Vaccaro, editors,
Sequences II: Methods in Communications, Security, and Computer
Science, pages 143152. SpringerVerlag (.ps)

Spectral Analysis and Search Engines

Clever Paper: "Authoritative sources in a hyperlinked environment" by
Jon Kleinberg in Symposium on Discrete Algorithms, 1998 (.ps)

Google Paper: "The Anatomy of a LargeScale Hypertextual Web Search
Engine" by Sergey Brin and Lawrence Page in www7 (
html )

"Spectral Analysis of Data" by Yossi Azar, Amos Fiat, Anna Karlin,
Frank McSherry and Jared Saia. Symposium on Theory of Computing
2001. ( ps , pdf).

Online Algorithms

Randomized Caching Algorithms
Pages 374377 in the "Randomized Algorithms" textbook

Online Learning Algorithms
Section 3.2 (The "Winnow Algorithm")
of the survey paper "Online Algorithms in Machine Learning" by Avrim Blum.
(.ps)
See also "Empirical Support for Winnow and WeightedMajority
Algorithms: Results on a Calendar Scheduling Domain" by Avrim Blum in
12th International Conference on Machine Learning (.ps)

Online Auctions ???

Miscellanious Algorithmic Tricks

The String Alignment Algorithm for Computational Biology
Text Books
There are no required text books but the following books are
recommended reading material.

Randomized Algorithms by Rajeev Motwani and Prabhakar Raghavan

Approximation Algorithms for NPHard Problems Edited by Dorit
S. Hochbaum
Tentative Grading Weights
 Homeworks (60%)
 Project (40%)
Grading Methodology
Homeworks
Your hw should have the following properties, these will be the
criteria used to determine your hw grade.
 Clarity: Make sure all of your work and answers are clearly
legible and well separated from other problems. If we can't read it,
then we can't grade it. Likewise, if we can't immediately find all of
the relevant work for a problem, then we will be more likely to grade
only what we see at first.
 Completeness: Full credit for all problems is based on both
sufficient intermediate work (the lack of which often produces a
'justify' comment) and the final answer. There are many ways of doing
most problems, and we need to understand exactly how YOU chose to solve
each problem.
Here is a good rule of thumb for deciding how much detail is sufficient:
if you were to present your solution to the class and everyone understood
the steps, then you can assume it is sufficient.
 Succinctness: The work and solutions which you handin should be
long enough to convey exactly why the answer you get is correct, yet
short enough to be easily digestible by someone with a basic knowledge
of this material.
If you find yourself doing more than half a page of dense algebra,
generating more than a dozen numeric values or using more than a page
or two of paper per problem for your solution, you're probably doing
too much work.
Don't turn in pages with scratch work or multiple answers  if you need to
do scratch work, do it on separate scratch paper. Clearly indicate your
final answer (circle, box, underline, etc.).
Note: It's usually best to rewrite your solution to a problem before you
hand it in. If you do this, you'll find you can usually make the
solution much more succinct.
Project
A significant part of this class is the class project. In this
project, you will apply mathematical tools learned in this class to
solve an algorithmic problem. The project must have a significant
analytical component to it where you demonstrate mastery of
mathematical tools learned in this class. The project should also
contain an empirical component where you do empirical tests which
support or complement your analytical results.
The main deliverable for the class project is a paper no more than
twelve pages in length (not including bibliography and appendix).
This paper should be structured as a standard research paper in that
it should have an abstract, an introduction, a related work section, a
body (containing a section on algorithms and a separate section on
analysis), and a conclusion and future work section. Your project
grade will depend substantially on this paper.
Prerequisites
A necessary prerequisite for this class is a standard undergraduate
algorithms course (equivalent to our CS 461). Some basic familiarity
with discrete probability is also helpful although not required.
Policies
Assignment deadlines are strict: late homework will automatically
receive a grade of zero, unless reasonable cause can be shown (which
is easy for one, possible for two, and very hard for three or more!);
no makeup.
Collaboration is encouraged on all of the homeworks although the
solutions should be written up individually unless stated otherwise.
Usual university policies for withdrawals, incompletes and academic
honesty.