# The Theory of Computation

## ADDITIONAL EXERCISES FOR CHAPTER 8

We can improve considerably our classification of (k,l)-SAT variants by using the probabilistic method and, in particular, a result known as Lovasz' local lemma. In a nutshell, the probabilistic method demonstrates the existence of an object by showing that the probability that such an object exists is nonzero -- in the case of SAT, we demonstrate that the instance is satisfiable by showing that the expected number of satisfying truth assignments is non-zero. Lovasz' local lemma allows one to place a lower bound on the intersection of a collection of events that are mostly, but not completely independent. In the version we need here, we can express the lemma as follows: given a collection of n events {Ei}, each with probability at most p and each dependent on at most d other events in the collection, and given the condition (d+1)p <= 1/e then the probability that none of the events occur (the probability of the intersection of the complements of the Eis) is nonzero. Apply the Lovasz local lemma to prove that any instance of k-SAT in which no variable appears in more than 2k/(ke) clauses in all is satisfiable. For large values of k, this is a very strong result.
Decision Tree. Examine the behavior of the following approximation algorithm for the Decision Tree problem: construct the tree node by node; at each internal node, pick that of the remaining tests which most nearly splits in half the remaining possible identifications. Prove that no constant-distance approximation algorithm exists for chromatic number.
Disjoint Cycle Cover. This problem is posed on (un)directed graphs and asks for the smallest collection of edge- or vertex- disjoint cycles such that each vertex of the graph is included in at least one cycle. All four versions of this problem are known to be NP-hard (see Additional Exercise 7.30.) Prove that an epsilon-approximation for any version of this problem is NP-hard for any epsilon > 0.
Clustering. A k-clustering of a set S is a partition of the set into k subsets which optimizes some clustering criterion. Typically, we are given a measure, d: SxS -> N, which quantifies the dissimilarity between two elements. Well-defined clusters then are such that all elements in a cluster are very similar and any two elements from distinct clusters are very dissimilar. This approach suggests two criteria:
• Minimize the intra-cluster dissimilitude
sumi=1k sumx,y in Ci d(x,y)
• Maximize the inter-cluster dissimilitude
sumi=1k-1 sumj=i+1k sumx in Ci, y in Cjd(x,y)
The first is the k-Clustering problem of Theorem 7.14, the second is known as the k-MaxCut problem; both problems are NP-equivalent. In fact, an optimal solution to one problem is also an optimal solution to the other, yet the two problems have very different properties when it comes to approximation.
• Show that both problems are NP-equivalent.
• Show that the two problems always have the same optimal solutions.
• Show that any epsilon-approximation for k-Clustering is NP-hard. (Hint: use graph colorability.)
• Show that k-MaxCut has a simple 1/k approximation algorithm. (Hint: the approximation algorithm, which runs in O(n2) time, starts by putting one arbitrarily chosen vertex in each cluster, then proceeds to add each remaining vertex to an appropriate cluster -- i.e., according to some greedy measure; the measure itself is most interesting in view of the above.)
• Explain the difference between the approximation versions of the two problems.
The Good Approximation Theorem. Assume the following result (due to Berman and here couched in terms of Clique problem, but trivially generalizable to any NP-complete problem):

Theorem. If there can be found a language L, a function f: N -> N, and a transformation t, such that:

• x is a yes-instance of Clique if and only if t(x) is in L. (That is, t is a standard many-one transformation from Clique to L.)
• t(x) can be computed in deterministic f(|x|) time.
• t is f-sparse, that is, the number of different strings produced by t when applied to strings of length at most n is bounded above by f(n).)
Then the deterministic time complexity of Clique is bounded above by a polynomial or by a function of the form p(f(n)), for some polynomial p.

Use this theorem in answering the following questions.

• Prove that the existence of an NP-complete subset of 0* implies P=NP (our alphabet is of course just {0,1}).
• Let L be an NP-complete problem. Prove that, if there exist a polynomial p and a (deterministic) solution algorithm A for L such that the number of instances of size no larger than n on which A does not run in time p(n) grows subexponentially (slower than any exponential function of n), then the deterministic time complexity of any NP-complete problem is polynomially related to max{f(n),n}.
Such a solution algorithm would be a pretty good approximation algorithm, since it would always solve the problem and run in polynomial time in the majority of cases. In fact, this result is sometimes known as "the good approximation theorem."
• What exactly would be the consequences of the existence of a "good approximation" algorithm for an NP-complete problem? Relate the "good approximation theorem" to your discussion of the complexity class SUBEXP (Exercise 6.24).