\documentclass[11pt]{article}
\usepackage{epsfig,times,amsmath}
\advance\textwidth by1.5in\advance\oddsidemargin by-0.75in
\advance\textheight by1.5in\headheight 0pt\topskip -10pt\headsep 0pt\topmargin 0pt
\begin{document}
\thispagestyle{empty}
\begin{center}
  \large\bf CS 561\ \ Data Structures and Algorithms\\
  \bigskip

  \large\it Test II: Solutions
\end{center}

\bigskip\noindent
\emph{Problem 1.}\\ 
Given a directed graph $G=(V,A)$, consider graphs $G'=(V,A')$, for
$A\subseteq A$, such that the transitive closure of $G'$ equals that of~$G$.
(Recall that the transitive closure of $G$ is simply the graph $G^*=(V,A^*)$
where $(u,v)\in A^*\leftrightarrow\exists \text{path from $u$ to $v$ in $G$}$.)
We are interested in such graphs $G'$ with the smallest possible subset
$A'$ of arcs.

Prove that, if $G$ is a DAG, then the minimum subset $A'$ is unique.

Suppose, for a proof by contradiction, that we have two minimum subsets,
say $A_1$ and $A_2$.  Since the two have the same transitive closure,
if there is a path from some vertex $u$ to some vertex $v$ in one,
there is a path from $u$ to $v$ in the other---although not necessarily
passing through the same sequence of vertices.  Since $A_1$ and $A_2$ are
different, there must be at least one arc, say $(a,b)$, that is present
in one subset, but not the other; let us assume it is present in $A_1$, but
not in $A_2$.  Now, in $A_1$, there is a path from $a$ to $b$ (just the
one arc), so there must also exist such a path in $A_2$, but it must pass
through at least one intermediate vertex, call it $c$: that it,
$A_2$ has arc $(a,c)$ and a path from $c$ to $b$.  But then of course
$A_1$ has a path from $a$ to $c$ and one from $c$ to $b$; but this
makes the arc $(a,b)$ redundant in $A_1$, since the $a\rightarrow b$
connectivity can then be established through the paths from $a$ to $c$
and then from $c$ to $b$.  Hence $A_1$ is not minimal: $A_1-\{(a,b)\}$
has the same transitive closure and has one less edge; this is the desired
contradiction.

So where did we use the fact that the graph was a DAG?\ \ Perhaps the best
way to see it is to begin by devising a counterexample---a directed
graph with cycles where we have two distinct minimum subsets.
A trivial example is a graph on 3 vertices with every possible arc
(6 arcs in all, since there is one in each direction between every pair):
a minimum subset is simply a cycle of 3 arcs, but we have two of these,
one in each direction---so we would have, e.g., $A_1=\{(a,b),(b,c),(c,a)\}$
and $A_2=\{(a,c),(c,b),(b,a)\}$.  If we look at $(a,b)$, an arc in $A_1$,
but not in $A_2$, we see that the $a\rightarrow b$ path in $A_2$
is $(a,c)$ and $(c,b)$; so now we look for the paths from $a$ to $c$
and from $c$ to $b$ in $A_1$: the first is $(a,b)$ and $(b,c)$ and the
second is $(c,a)$ and $(a,b)$---note that each of these two paths
reuses the edge $(a,b)$, so we cannot remove $(a,b)$ from $A_1$ and
preserve the connectivity.  In other words, we used the DAG property
earlier when we implicitly assumed that the arc $(a,b)$ of $A_1$
would not be present in the $A_1$ paths from $a$ to $c$ and from $c$
to $b$.  Verify that this is correct: if arc $(a,b)$ is present on
the path from $a$ to $c$ (and then to $b$) in $A_1$, we have a cycle
from $b$ back to $b$; and if arc $(a,b)$ is present on the path
from $c$ to $b$, then the path from $a$ to $c$ and then $c$ to $b$
has a cycle from $a$ back to $a$.  In either case, the presence of $(a,b)$
on the path implies the existence of a cycle and contradicts our
hypothesis that the graph is a DAG.

\bigskip\noindent
\emph{Problem 2.}\\ 
Devise a linear-time algorithm to index the vertices of a DAG $G=(V,A)$
so that, given any pair of vertices indexed $i$ and $j$ with $i<j$, we have
$(j,i)\notin A$.

Turning the definition upside down: the presence of an arc $(u,v)$
forces us to assign a lower index to $u$ than to $v$.  The requirement is
transitive: if we have arcs $(u,v)$ and $(v,w)$, we must assign a lower
index to $u$ than to $v$ and a lower index to $v$ than to $w$, and hence
a lower index to $u$ than to $w$---as if we also had an arc $(u,w)$.
This is the definition of topological sorting; and, as we have seen,
topological sorting can be accomplished in linear time (text, pp. 199--200).

More concisely: topological sort guarantees that the following holds
  $$(u,v)\in A \Longrightarrow ind(u) < ind(v)$$
Taking the contrapositive, we get
    $$ind(u) \geq ind(v) \Longrightarrow (u,v)\notin A$$
and, since equality of indices is not possible (because $u$ and $v$
are distinct vertices), we have our desired property.
  
\bigskip\noindent
\emph{Problem 3.}\\ 
You are given a collection of $n$ disjoint circles in the plane (note
that two circles could be disjoint, yet nested---in fact any type of nesting,
with many levels is possible) and a collection of $n$ points.
Devise an algorithm to report in $O(n\log n)$ time all points from the
collection that lie (strictly) outside every circle.  (Hint: use a sweep
line.)

If we use a normal sweep line, then it is entirely possible
that all circles are active in the same strip, in which case
every point in that strip may have to be tested against every
circle---too expensive.  We once again need to run a test
that only checks each point against a constant number of circles;
hence, if circles are nested, we should only test against the nearest
two or against the outermost one.   If we take inspiration from the
solution to Homework \#5, we can partition each circle into four
quarter circles and easily maintain a vertical ordering among these quarter
circles.  Now, when a point is encountered in the sweep (as opposed to
the beginning or end of a quarter circle), we can locate it in the
vertical ordering of active quarter circles in logarithmic time.

However, locating it is not enough by itself to decide whether it
sits inside some circle: we need an indication of nesting of the
various circles.

We can do that in one of two ways.  Simplest is to eliminate nested circles
as they come in, retaining only the outermost circle: we can do that because
we know that circles do not intersect---thus, as a new circle starts, we can
test whether it is inside another existing circle (by testing whether
its above and below neighbors are in fact up and down pieces of the same
circle).  Once that is done, nesting is no longer an issue: all circles
left in the data structure are pairwise disjoint as disks as well, so that
testing for a point being within a circle is just looking at the arcs
immediately above and below the point: if they belong to the same circle,
the point lies within the circle, otherwise not.

We can also explicitly maintain the nesting information and handle
all circles.  Our nested circles are exactly like nested parentheses
and we only need a count of ``closing" and ``opening" arcs below
each interval in the vertical order: if these two values are equal,
then the interval is outside any circle, otherwise not.  So we maintain
two ordering trees: one for bottom arcs and one for top arcs; in a
binary search trees, it is a trivial matter to maintain at each internal
node a count of the number of nodes in the subtree rooted at that internal
node.  (We do this during any insertion or deletion and it only adds at
most one operation per node along the insertion or deletion path, hence
still runs in logarithmic time.)

\bigskip\noindent
\emph{Problem 4.}\\ 
You are given a graph $G=(V,E)$ where each vertex has degree at least~$1$.
You are to find the smallest subset $E'\subseteq E$ such that
  $$\forall v\in V,\,\exists e\in E',\,v\text{ is an endpoint of }e$$
(Hint: use matching.)

Denote a minimum subset with these attributes as $E^*$.
$E^*$ is not necessarily a matching: it may have multiple edges sharing
an endpoint---it covers that endpoint multiple times, but that may
be required in order to cover other vertices.  We provide an algorithm,
verify its correctness, and show that it produce a solution of size
equal to $|V|-|M^*|$, where $M^*$ is a maximum matching:

Clearly, if the graph has a perfect matching (a matching in which all
vertices are matched), then the matching itself is a suitable $E^*$
and is clearly optimal, as every new edge accounts for two new vertices,
the most that any edge can contribute.
Such a graph must have $n=2k$ vertices (an even number) and its matching
has $n/2=k$ edges, as does (since it is the same set) its $E^*$---so indeed
the sum of the number of edges in $M^*$ and in $E^*$ is $k+k=2k=n=|V|$.

Otherwise, we build $E^*$ by including the edges of $M^*$, each of which
accounts for two vertices, plus one edge for each unmatched vertex---so that
each remaining edge has as endpoints one matched and one unmatched vertex.
Since we have maximized the number of edges that contribute two vertices
(and thus also minimized the number of edges that contribute only one vertex),
we have a minimum solution.  To see that, first note that selected edges
must contribute either two new vertices or one new vertex: there is no
point in including an edge that does not contribute any new vertex,
and no edge can contribute more than two vertices.  If each edge contributes
either 1 or 2 vertices and we need to touch all $|V|$ vertices, then the
smallest collection of edges is that which has the largest number of edges
contributing 2 vertices each: but that is precisely a maximum matching.

The number of edges in that solution is
the number of edges in the matching, $|M^*|$, plus the number of unmatched
vertices, call it $U$, that is, we have
  $$|E^*|=|M^*|+U$$
Together, these edges account for all vertices, so we can write
  $$2\cdot |M^*| + 1\cdot U = |V|$$
or
  $$|M^*| + 1\cdot U = |V| - |M^*|$$
Substituting, we get, as desired
  $$|E^*|=|V|-|M^*|$$
The construction of $E^*$ is thus just the construction of a maximum
matching, plus the addition of one edge for each unmatched vertex;
the latter takes only linear time, so the former dominates the running time.
(The actual time it takes is the time of the best matching algorithm for
the kind of graphs one deals with and so varies somewhat.)

We cannot do better than that: the two problems are in fact the same.
If we have an algorithm to produce such a smallest collection of edges,
we can use it to produce a maximum matching very simply in linear time:
for any vertex that has degree larger than one in the subset, we arbitrarily
remove all incident edges except one.  What is left is obviously a matching;
it is maximum, because any smallest subset of edges contains no path of
length larger than 2 (if it had a path of longer length, we could remove every
second edge from that path) and so is really composed of stars (trees with
a diameter of at most 2), so that vertices of degree larger than 1 are
not connected to other such vertices and our arbitrary removal is justified.
Since it takes only linear time to remove the excess edges, the running
time of this matching algorithm is asymptotically equal to that of the
minimum edge subset algorithm---and so the two problems are equivalent in
complexity.

\bigskip\noindent\rm
\emph{Problem 5 (bonus only: a substitute for \#3).}\\ 
You are given a collection of $n$ disjoint triangles in the plane
without any nesting.  Design an algorithm that will run in $O(n\log n)$
time and connect the triangles by line segments.  The requirements
are: (i) none of the line segments can intersect any other;
(ii) each line segment must lie entirely outside the triangles
(touching only one point on the perimeter of a triangle at each end---note
that the point touched need not be a vertex of the triangle, but can be
along an edge); and (iii) the segments establish connectivity among all
triangles---that is, if we replaced each triangle by a single graph vertex
and connected the graph vertices as the triangles are connected, the
result would be a connected graph.

In the sweep approach, we defined both a left-to-right ordering (to create
vertical strips) and a bottom-to-top ordering (within each strip);
if we can somehow do the same with the triangles, it will then suffice
to connect each triangle with a predecessor in the sweep (its upward or
downward neighbor in the vertical ordering).
So we need to maintain an ordering of the triangles along
both axes in the style of our sweep-line algorithm for segments.
For the horizontal ordering, we can consider the projection of the
triangle onto the $x$ axis, with a start and an end point, in normal
sweep-line fashion.  So what do we do when a new triangle begins and
when an existing one ends?
Note that, since the triangles are disjoint, we can represent
each triangle by one of its three edges, always choosing an edge that
spans the horizontal range of that triangle.  Then we are down to
just segments, which we already know how to handle: the vertical ordering
can be determined by the current ordinate on the chosen edge of the triangle.
That ordering is well defined because the triangles do not intersect.

It then only remains to connect the triangles.  The easiest way to do this
is to connect a triangle as soon as it appears, using its leftmost vertex.
The one problem we face is that the information on what are the upward
and downward neighbors may have disappeared if these two triangles do not
exist within the current strip (if the strip contains only the new triangle).
Thus we must do a little extra work: when a triangle comes to an end, if it
is the last triangle left in our tree, we must retain its identity so
as to be able to connect it to the new triangle that will arrive later.
Hence our connection can work as follows:
if the new triangle has a downward neighbor, connect the leftmost
vertex of the new triangle to the downward neighbor by a vertical
segment (computing the intersection with the neighbor takes constant
time, just checking each of its three edges); if there is no downward
neighbor, but there is an upward neighbor, connect to the upward neighbor
in the same way; otherwise, if the tree is empty, connect the leftmost
vertex of the new triangle to the rightmost vertex of the last triangle
to be removed (whose ID is stored in a holding variable).
The construction ensures that our connections cannot intersect each other
nor other triangles; and what we build is a tree of backpointers, with the
root being the leftmost triangle.
The extra work only takes constant time per triangle, so the running time
remains that of the sweep, $O(n\log n)$.

\end{document}
