UNM Computer Science

Theory Part - Ph.D. Comprehensive Examinations, Fall 2001



 

Part A: Do all three problems

Problem 1. In various areas of theory, one often needs an encoding (into the natural numbers) of various objects. Consider encoding finite sorted lists of natural numbers. We want, naturally, an encoding that is injective, reasonably easy to construct (and to decode later), and such that nonsense codes, if any (there will be some if the encoding function is not surjective), are easily detected. Someone proposed the following encoding scheme, unfortunately defined ``in reverse.''

The function f maps natural numbers into finite sorted lists of natural numbers (sorted in decreasing order) and is defined recursively as follows

Discuss the suitability of this proposition. This entails checking (proofs or counterexamples, please!) whether or not all conditions given above are met. If the encoding is suitable, you must prove that it is injective and demonstrate efficient algorithms for encoding and decoding and for recognizing nonsense codes; and if it is not, you must provide counterexamples and propose possible corrections.



Problem 2. You are given a query routine which, when given the parameters i and j (two distinct positive integers) returns the result of the game played between players i and j. (Every one of the possible (n || 2) games has been played.)  Design and analyze an algorithm that makes Q(nlogn) queries and orders the players in such a way that the ith player in the resulting ordering has defeated the i+1st, for each i, 1 £ i < n.

Problem 3. Analyze the time and space complexity of the following algorithm, intended to compute the maximum and minimum of a set by the divide-and-conquer method.

  function MaxMin(S,k)
  if ( |S| <= k )
     then return (max(S), min(S))
     else begin
             {S1,S2,...,Sk} <- partition(S,k)
             for i=1 to k do
                (mxi,mni) <- MaxMin(Si,k)
             return (max(mx1,mx2,...,mxk), min(mn1,mn2,...,mnk))
          end
  end

The functions max (which returns the maximum of a set of no more than k elements), min (which returns the minimum of such a set), and partition (which divides a set into k subsets, as nearly equal in size as possible) are predefined.

Specifically, you are to find, as a function of |S| and k:

  1. sep 0pt
  2. the number of calls to max and min;
  3. the number of calls to partition;
  4. the space used, assuming that each subset Si requires |Si| space and that all local storage is kept until return from the call.
Finally, use your answers to (1) and (2) to analyze the trade-off involved in increasing k if the cost of performing the functions max, min, and partition increases as a function of k; in particular, under which conditions (growth rate as a function of k) do you hit a point of diminishing returns?

 

Part B: Answer all subquestions

Consider the following generic problem. Some process (details later) has created an unrooted binary tree (in which every node has degree 1 or 3) with every node (internal nodes and leaves) labelled. The space of labels is a metric space, that is, there exists a distance measure on the space of labels that obeys the properties of a metric (non-negative, obeys triangle inequality, etc.).

Unfortunately, you have not been given the labels of the internal nodes; indeed, you have not even been given the tree-you only know the labels of the leaves and the distance metric. From this information, you are to reconstruct the tree (and, as a bonus, the labels of the internal nodes). This is a common problem in evolutionary biology, linguistics, and other areas: reconstruct from modern characteristics the evolutionary history of a collection of species, languages, etc.

In most cases, you will also be given a specification of the stochastic process by which new labels are created from existing ones-i.e., of the process by which the label at one end of a tree edge is transformed into the label at the other end of the tree edge. If the data are DNA sequences, for instance, the process is basically the same as in string editing: insertion, deletion, and substitution are the three mechanisms.

Simplify the problem to assume that all labels are strings over a small alphabet (as in DNA, over a 4-character alphabet) and that all strings have exactly the same length. Assume that we want to produce the tree with the smallest possible sum of edge lengths.

  1. Formally define that reconstruction problem.
  2. Carefully compare that problem with the minimum spanning tree problem (also a smallest sum of edge lengths) and with the Huffman code problem (also construct a binary tree only knowing something about its leaves).
  3. Consider the following greedy strategy, derived from the Huffman code algorithm: look only at the distances between the leaves. Pick the closest pair, merge them (i.e., give them an internal node as a common parent), remove the two from the distance matrix, and replace them by a single entry (representing the subtree) after computing new distances from this entry to the remaining n-2 leaves. Repeat until only three entries remain, at which point they are joined into a star.

    Propose a method for computing plausible distances from the new entry to the other n-2 entries. Does your method provide a way to assign a label to the new internal node?  Analyze the running time of the entire algorithm. Finally, provide an example showing that this greedy algorithm, unlike Huffman's algorithm from which it is inspired, is not optimal.

 

Part C: Do just one question (but do it in detail!)

Problem 1 (algorithm design). You are given a rooted (but not necessarily binary) tree of items, where each item has a positive weight. (For instance, the tree can be specified by a collection of pairs of the form (i,p(i)), where i is an item and p(i) its parent in the tree.)  Devise (and analyze and prove the correctness of) an efficient algorithm to produce a total ordering of the items that

  1. sep 0pt
  2. respects the partial order, in that every item appears in the total order after all of its children; and
  3. minimizes the sum
    n
    å
    i=1 
    order(i)·weight(i)



Problem 2 (computational geometry). Devise, prove the correctness of, and analyze a fast algorithm to compute exactly (i.e., not through probabilistic or approximate methods) the area of a convex polygon. The polygon is given as a list of vertices, where each vertex is a pair of coordinates, and where the list follows a counterclockwise tour of the perimeter. Assume that an unbounded precision rational arithmetic package is available. If the input is a collection of one-word integers, how many words might you need to represent the answer?

How does your answer change if the polygon is simple, but not necessarily convex?



Problem 3 (computational geometry). What is a kinetic data structure? How does it differ from a more conventional dynamic data structure? What purpose does it serve? What assumptions are usually made about their usage? What are the main tradeoffs to be considered in the design of a kinetic data structure? Since we can often dynamize typical geometric data structures, do you think we can kinetize them as well?

Write a few paragraphs, using various examples (which can be very artificial and need not be from the literature) to illustrate



Problem 4 (optimization algorithms). Consider the following problem. An n×n array of positive integers is wrapped around in a cylinder (so that its nth row is followed by its 1st row). A path across the cylinder is a succession of n array elements, such that the first element in the path is an element of the first column, the last element in the path is an element of the last column, and an element (i,j) can be followed only by one of the three adjacent elements in the next column, namely, (i,j+1), (i-1,j+1), or (i+1,j+1) (where all arithmetic is done modulo n). Finally, the cost of a path is just the sum of the values of the elements in the squares crossed on the path.

Design an efficient (decent polynomial time) algorithm which finds a minimum-cost path across the cylinder. Carefully analyze your algorithm; in particular, find the total number of possible paths, the number of paths examined by your algorithm, and its running time.

Problem 5 (complexity). The most celebrated result in Complexity Theory in the last 25 years is the PCP theorem. This theorem states that every problem in NP has a certificate that can be checked probabilistically by looking only at a constant number of bits (chosen at random with the help of a logarithmic number of random bits) in the certificate. The main use of this theorem so far has been to show inapproximability results-that is, to show that some problem P cannot be approximated in polynomial time within a better ratio than 1+a (for some constant a > 0) unless we have P=NP. Explain the connection between the PCP theorem and the inapproximability results in your own words: no proof, of course, but discuss informally the path that links the two types of results.



Problem 6 (complexity). Prove that the following problem is NP-complete. Does your proof imply that the problem is NP-complete even for graphs of degree 3?  for planar graphs?

Instance: a graph, G=(V,E), and a positive integer, k £ |V|.

Question: is there a coloring of the vertices with two colors, say {r,b}, such that, for each vertex colored r, there is at least one adjacent vertex colored b, and where there are at least k vertices colored r?



Problem 7 (computability). Define a real number to be computable if there exists a total function f such that: (i) f(0) is the integer part of the number; and (ii) for n > 0, f(n) is the nth decimal digit of the fractional part of the number.



Problem 8 (computability). For each of the following sets and their complements, establish whether it is recursive, r.e., or non-r.e.

  1. sep 0pt
  2. The set of all monotone nondecreasing partial recursive functions.
  3. The set of partial recursive functions with infinite domains consisting exclusively of primes.
  4. The set of all RAM programs equivalent to at least one RAM program of no more than 10 lines.

Problem 9 (parallel algorithms). A standard randomized parallel algorithm for symmetry-breaking runs on constant time and linear total work on an EREW PRAM model as follows:

sep 0pt
Input:
a directed cycle, specified in an array S (successor)
Output:
a set of vertices (labelled 1 while vertices not in the set are labelled 0) which forms an independent set of constant fractional size with high probability
Code:
for all vertices v in parallel do

  1. sep 0pt
  2. assign label(v) = 1 or 0 randomly with equal probability
  3. if label(v) = label(S(v)) = 1, then set label(v) = 0
It can be shown that there exists a constant 0 < a < 1/8 such that the independent set produced is smaller than an with probability less than e-bn, where b is a fixed function of a.

Use this algorithm to develop a list-ranking algorithm that runs in O(logloglogn) time with high-probability.