edu.unm.cs.cs351.tdrl.f09.p1
Interface GraphAnalyzer<T>


public interface GraphAnalyzer<T>

Interface for an object that analyzes basic graph structural statistics. Objects of this type provide a number of methods for calculating statistics about average graph topology, including shortest paths and connected components, diameter, degree distributions, etc.

Caching (optional). The intent is that objects of this type accept a Graph object in the constructor and then all operations are defined with respect to that Graph. If this object is made immutable, and the Graph is not changed during the existence of this object, then the GraphAnalyzer can cache intermediate results. For example, a number of operations require essentially the same loop over the graph; the GraphAnalyzer can arrange to perform that loop only once and to cache the results of the computation.

Regardless of whether implementations of this interface support caching, the constructor for any concrete implementations MUST run in O(1) time and consume only O(1) space. In particular, they MUST NOT pre-compute/pre-allocate any expensive values at construction time. They MUST wait until requested via one of the methods defined in this interface. This avoids potentially computing expensive results that will never be requested by the user.

The documentation of all methods in this interface assumes a directed graph, G, containing V nodes and E edges. The notation x^p represents "x raised to the pth power", not "x xor p". The notation x_{ij} should be read "x subscripted with ij".

Note that none of the methods in this interface depend on the type of the Graph's nodes -- GraphAnalyzer only cares about the structure of the graph, not its contents. However, objects of this type MUST be Java generics because they will be initialized with Graphs of some specific type.

Efficiency The runtime bounds given for a number of the operations in this interface are fairly loose. It is certainly possible to carry out a number of these operations (such as all-pairs shortest paths and computation of strongly connected components) more efficiently than the values given here. But the values given here are amenable to simple implementations. Ambitious designers are encouraged to find more efficient implementations of these operations.

Version:
1.0
Author:
terran

Method Summary
 int[][] allPairsShortestPaths()
          Compute the all-pairs shortest paths distances between every pair of nodes in the graph.
 double avgInDegree()
          Computes the average INDEGREE for any node in the graph.
 double avgOutDegree()
          Computes the average OUTDEGREE for any node in the graph.
 double avgShortestPathDistance()
          Returns the average distance between all reachable pairs of nodes.
 int countSCCs()
          Returns the number of strongly connected components (SCCs) in the graph.
 int diameter()
          Computes the diameter of the reachable portion of the graph.
 double[] inDegreeDistribution()
          Calculate the probability mass function (PMF) of the indegrees of nodes in this graph.
 int maxInDegree()
          Computes the maximum INDEGREE for any node in the graph.
 int maxOutDegree()
          Computes the maximum OUTDEGREE for any node in the graph.
 int maxSCCSize()
          Return the size of the largest strongly connected component (SCC) in the graph.
 int minInDegree()
          Computes the minimum INDEGREE for any node in the graph.
 int minOutDegree()
          Computes the minimum OUTDEGREE for any node in the graph.
 double[] outDegreeDistribution()
          Calculate the probability mass function (PMF) of the outdegrees of nodes in this graph.
 

Method Detail

allPairsShortestPaths

int[][] allPairsShortestPaths()
Compute the all-pairs shortest paths distances between every pair of nodes in the graph. This uses an all-pairs shortest paths algorithm (such as Floyd-Warshall) to compute the shortest directed distance between every two nodes in the graph. Note: this does not return the shortest paths themselves; it merely computes the distances that those paths require.

The returned matrix has one row and one column for every node in the graph. The interpretation of this matrix is:

   dist[i][j]==d
 
means that the shortest distance starting at node i and ending at node j is d. (That is, nodes whose Graph.getNodeID(Object) is i and j.)

The distance from node i to itself is 0 and if j is not reachable from i then dist[i][j]==Integer.MAX_VALUE (i.e., infinity).

If the graph is empty (contains no nodes or edges), then this returns a size-0 matrix. That is, it returns essentially new int[0].

This routine MUST run in O(V^3) time for a graph containing V nodes. It SHOULD require only O(V^2) space. It MAY use more than O(V^2) space, but not more than O(V^3) space.

Returns:
Inter-node distance matrix.

avgShortestPathDistance

double avgShortestPathDistance()
Returns the average distance between all reachable pairs of nodes. That is, this examines all pairs of nodes, i,j, such that dist[i][j]<Integer.MAX_VALUE and returns the average distance over those nodes.

An empty graph is defined to have an average distance of 0.

This method MUST run in time O(V^2) beyond that required by the allPairsShortestPaths() method. (That is, this method MAY execute allPairsShortestPaths() and then spend up to O(V^2) time beyond that.

Returns:
Average shortest-path distance

countSCCs

int countSCCs()
Returns the number of strongly connected components (SCCs) in the graph. This locates the total number of strongly connected components in the graph, where a SCC (strongly connected component) is defined to be a set of mutually reachable nodes. Note that a singleton node (a node with no outgoing or incoming edges) is, by definition, strongly connected, as it is automatically reachable from itself.

An empty graph is defined to have 0 strongly connected components.

This MUST run in time O(V^3) and require at most O(V^2) space. However, most of that time/space is consumed in computing all-pairs shortest-paths (in the most straightforward implementation). This MUST require no more than O(V^2) time beyond that required by the all-pairs shortest paths execution, and comsume at most O(V) additional space.

Returns:
Number of strongly connected components.

maxSCCSize

int maxSCCSize()
Return the size of the largest strongly connected component (SCC) in the graph. While countSCCs() finds how many SCCs there are, this identifies the largest such component and returns the size (number of nodes) of that component.

An empty graph is defined to have 0 SSCs, the largest of which is size 0.

This MUST run in time no more than O(S) beyond that required by countSCCs(), for a graph containing S SCCs.

Returns:
Number of nodes in the largest SCC in the graph.

diameter

int diameter()
Computes the diameter of the reachable portion of the graph. This method finds the maximum distance between any pair of nodes that are reachable (i.e., whose distance is not infinite). That is, it computes
 dmax=max_{i,j in V} ( d_{ij} )
 subject to:
   d_{ij}<Integer.MAX_VALUE
 

An empty graph is defined to have a diameter of 0.

This MUST run in time O(V^3)

Returns:
Diameter of reachable portion of the graph

maxInDegree

int maxInDegree()
Computes the maximum INDEGREE for any node in the graph. This method finds the node with the maximum number of edges entering it, and returns that number of edges.

An empty graph is defined to have a max indegree of 0.

This method MUST run in O(V+E) time for a graph with V nodes.

Returns:
Maximum indegree over all nodes in the graph.

minInDegree

int minInDegree()
Computes the minimum INDEGREE for any node in the graph. This method finds the node with the minimum number of edges entering it, and returns that number of edges. Note that a node with no parents has an INDEGREE of 0, although a node with a self-edge has an INDEGREE of 1.

An empty graph is defined to have a min indegree of 0.

This method MUST run in O(V+E) time for a graph with V nodes.

Returns:
Minimum indegree over all nodes in the graph.

avgInDegree

double avgInDegree()
Computes the average INDEGREE for any node in the graph. This method computes the average number of parents of any node in the graph. This can be as low as 0 for a completely disconnected graph (no node has any parents) or as high as V for a completely connected graph (i.e., a clique -- a graph in which every node is a parent of every other node).

An empty graph is defined to have an average indegree of 0.0.

This method MUST run in O(V+E) time for a graph with V nodes.

Returns:
Average indegree over all nodes in the graph.

inDegreeDistribution

double[] inDegreeDistribution()
Calculate the probability mass function (PMF) of the indegrees of nodes in this graph. This method computes the frequency of occurrence of each possible indegree over this graph. For a graph of n nodes, the result is an array of n+1 values. The j entry of the result gives the frequency (not count) of nodes that have an indegree of j. Note that the minimum possible indegree for any node is 0, represented by result[0], and the maximum is n, represented by result[n].

Note that this method is responsible for allocating space for the result array. It MUST NOT allocate more (or less) space than necessary.

If the graph is empty, this returns an empty distribution. That is, a distribution containing 0 elements.

This method MUST run in O(V+E) time for a graph with V nodes.

Returns:
Frequency distribution (i.e., PMF) of indegrees over all nodes in the target graph.

maxOutDegree

int maxOutDegree()
Computes the maximum OUTDEGREE for any node in the graph. This method finds the node with the maximum number of edges leaving it, and returns that number of edges.

An empty graph is defined to have a max outdegree of 0.

This method MUST run in O(V) time for a graph with V nodes.

Returns:
Maximum outdegree over all nodes in the graph.

minOutDegree

int minOutDegree()
Computes the minimum OUTDEGREE for any node in the graph. This method finds the node with the smallest number of edges leaving it, and returns that number of edges. Note that a node with no neighbors has an OUTDEGREE of 0, although a node with a self-edge has an OUTDEGREE of 1.

An empty graph is defined to have a min outdegree of 0.

This method MUST run in O(V) time for a graph with V nodes.

Returns:
Minimum outdegree over all nodes in the graph.

avgOutDegree

double avgOutDegree()
Computes the average OUTDEGREE for any node in the graph. This method computes the average number of neighbors of any node in the graph. This can be as low as 0 for a completely disconnected graph (no node has any neighbors) or as high as V for a completely connected graph (i.e., a clique -- a graph in which every node is a neighbor of every other node).

An empty graph is defined to have an average outdegree of 0.0.

This method MUST run in O(V) time for a graph with V nodes.

Returns:
Average outdegree over all nodes in the graph.

outDegreeDistribution

double[] outDegreeDistribution()
Calculate the probability mass function (PMF) of the outdegrees of nodes in this graph. This method computes the frequency of occurrence of each possible outdegree over this graph. For a graph of n nodes, the result is an array of n+1 values. The j entry of the result gives the frequency (not count) of nodes that have an outdegree of j. Note that the minimum possible outdegree for any node is 0, represented by result[0], and the maximum is n, represented by result[n].

Note that this method is responsible for allocating space for the result array. It MUST NOT allocate more (or less) space than necessary.

If the graph is empty, this returns an empty distribution. That is, a distribution containing 0 elements.

This method MUST run in O(V) time for a graph with V nodes.

Returns:
Frequency distribution (i.e., PMF) over outdegrees for all nodes in the target graph.