This uses a sub-string search, so that "graph" matches both "graph" and "graphical".
This search only searches by keyword. If you'd like, you can also search by researcher.
Found 25 results.
Listing from newest to oldest
TR-CS-2011-02
Exploiting Task Relatedness for Multitask Learning of Bayesian Network
Structures
Diane Oyen and Terran Lane
We address the problem of learning Bayesian networks for a collection of unsupervised tasks when limited data is available and a metric of the relatedness of tasks is given. We exploit this valuable information about task-relatedness to learn more robust structures for each task than those learned with a standard multitask learning algorithm. Our approach is the first network structure learning algorithm that addresses zero-data learning; where no training data is available for a task, but data for related tasks and information about how the tasks are related is given. This paper describes our Task Relationship Aware Multitask (TRAM) algorithm for learning Bayesian networks. The inputs to the algorithm are task-specific sets of data and an undirected graph representing a metric of task-relatedness. TRAM regularizes over the task-relatedness graph to output a network for each task, each of which is biased toward the structures of related tasks. We empirically compare, using synthetic and neuroimaging data, TRAM with three baseline methods and show that task-relatedness knowledge impacts all learning methods. Only TRAM effectively uses this knowledge to learn more robust structures.
TR-CS-2010-08
Exploiting Task Relatedness to Learn Multiple Bayesian Network Structures
Diane Oyen and Terran Lane
We address the problem of learning multiple Bayesian network structures for experimental data where the experimental conditions define relationships among datasets. A metric of the relatedness of datasets, or tasks, can be described which contains valuable information that we exploit to learn more robust structures for each task. We represent the task-relatedness with an undirected graph. Our method uses a regularization framework over this task-relatedness graph to learn Bayesian network structures for each task that are smoothed toward the structures of related tasks. Experiments on synthetic data and real fMRI experiment data show that this method learns structures which are close to ground truth, when available, and which generalize to holdout data better than an existing multitask learning method, learning networks independently, and learning one global network for all tasks.
TR-CS-2008-15
Agreeing to disagree: Leveraging consensus and divergence in Bayesian belief aggregation
Kshanti A. Greene and George F. Luger
We present a new approach for combining the beliefs of many individuals using graphical models. Existing Bayesian belief aggregation methods break several theoretical assumptions for Bayesian reasoning. More practically, existing opinion pool functions that compute a single value to represent the belief of all contributors do not represent reality well, especially in cases where there are many diverse opinions. Divergence is a natural result of combining opinions from individuals with different beliefs, backgrounds and experiences. Instead of forming a single consensus value that will average out this diversity, we find clusters of agreement for each probability distribution and propagate the cluster means throughout the network during inference. We utilize a social network that tracks the agreement between individuals and the normalized graph cut algorithm to find emerging groups of consensus in the agreement network. We leverage the agreement that occurs across multiple belief estimates to help reduce the complexity that may arise as the means are propagated throughout a belief network. By monitoring agreement over time we may also expose the variety of backgrounds that will help explain divergence in belief. This paper discusses the approach, background and our motives for ongoing research.
TR-CS-2008-06
Using Laplacian Methods, RKHS Smoothing Splines and Bayesian Estimation as a framework for Regression on Graph and Graph Related Domains
Eduardo Corona, Terran Lane, Curtis Storlie, Joshua Neil
We first present an overview of the Laplacian operator, its properties, and the methods that arise from analizing its eigenvalues and eigenfunctions. We mainly draw our material from (BLS00) and (Chu97), trying to combine the linear algebra approach with the more analytic one, which suggests the analogue with the continuous Laplacian Operator on manifolds.
TR-CS-2007-17
Structured Streams: Data Services for Petascale Science Environments
Patrick M. Widener, Matthew Wolf, Hasan Abbasi, Matthew Barrick, Jay Lofstead,
Jack Pullikottil, Greg Eisenhauer, Ada Gavrilovska, Scott Klasky, Ron Oldfield,
Patrick G. Bridges, Arthur B. Maccabe, Karsten Schwan.
The challenge of meeting the I/O needs of petascale applications is exacerbated by an emerging class of data-intensive HPC applications that requires annotation, reorganization, or even conversion of their data as it moves between HPC computations and data end users or producers. For instance, data visualization can present data at different levels of detail. Further, actions on data are often dynamic, as new end-user requirements introduce data manipulations unforeseen by original application developers. These factors are driving a rich set of requirements for future petascale I/O systems: (1) high levels of performance and therefore, flexibility in how data is extracted from petascale codes; (2) the need to support .on demand. data annotation . metadata creation and management . outside application codes; (3) support for concurrent use of data by multiple applications, like visualization and storage, including associated consistency management and scheduling; and (4) the ability to flexibly access and reorganize physical data storage.
We introduce an end-to-end approach to meeting these requirements: Structured Streams, streams of structured data with which methods for data management can be associated whenever and wherever needed. These methods can execute synchronously or asynchronously with data extraction and streaming, they can run on the petascale machine or on associated machines (such as storage or visualization engines), and they can implement arbitrary data annotations, reorganization, or conversions. The Structured Streaming Data System (SSDS) enables high-performance data movement or manipulation between the compute and service nodes of the petascale machine and between/on service nodes and ancillary machines; it enables the metadata creation and management associated with these movements through specification instead of application coding; and it ensures data consistency in the presence of anticipated or unanticipated data consumers. Two key abstractions implemented in SSDS, I/O graphs and Metabots, provide developers with high-level tools for structuring data movement as dynamically-composed topologies. A lightweight storage system avoids traditional sources of I/O overhead while enforcing protected access to data.
This paper describes the SSDS architecture, motivating its design decisions and intended application uses. The utility of the I/O graph and Metabot abstractions is illustrated with examples from existing HPC codes executing on Linux Infiniband clusters and Cray XT3 supercomputers. Performance claims are supported with experiments benchmarking the underlying software layers of SSDS, as well as application-specific usage scenarios.
TR-CS-2007-13
Dominance: Modeling Heap Structures with Sharing
Mark Marron, Rupak Majumdar, Darko Stefanovic, Deepak Kapur
A number of papers have used predicate languages over sets of abstract locations to model the heap (decorating a heap graph with the predicates, or in conjunction with an access path abstraction). In this work we introduce a new predicate, dominance, which is a generalization of aliasing and is used to model how objects are shared in the heap (e.g. do two lists contain the same set of objects?) and how sharing influences the results of destructive updates (e.g. if all the objects in one list are modified does that imply that all the objects in another list are modified?). The dominance relation is introduced in the context of a graph-based heap model but the concept is general and can be incorporated in other frameworks as a high-level primitive.
The motivation for introducing a higher-order predicate to model sharing is based on the success of existing approaches which use connectivity, reachability, and sharing predicates to perform shape analysis using high-level graph-based models. Our approach provides an analysis technique that is efficient and scalable while retaining a high level of accuracy. This is in contrast to proposals in the literature based on using logical languages (usually built on top of first-order predicate logic) which either are highly expressive and very general (resulting in a computationally expensive analysis) or utilize only fragments of the full logic (which reduces the computational cost but results in a much less powerful analysis).
The approach presented in this paper has been implemented and run on a variation of the Jolden benchmarks. The information extracted using the analysis has been used to parallelize these programs with a near optimal speedup for 7 of the 9 benchmarks. The runtimes are small enough to make the analysis practical (a few seconds for each program).
TR-CS-2007-08
Efficient Context-Sensitive Shape Analysis with Graph Based Heap Models
Mark Marron, Manuel Hermenegildo, Deepak Kapur, and Darko Stefanovic
The performance of heap analysis techniques has a significant impact on their utility in optimization and verification applications. A significant factor affecting analysis performance is how procedure calls are handled. To maximize the accuracy of the analysis it is desirable to perform inter-procedural data flow analysis in a context-sensitive manner. However, re-analyzing a procedure for each calling context can result in the analysis being computationally intractable. To minimize the computational cost of performing context-sensitive dataflow analysis it is critical to minimize the number of times a function body needs to be re-analyzed. The computational cost can be further reduced by minimizing the amount of information that is carried in from the caller scope and thus needs to be processed during the analysis of the callee procedure. This can be accomplished by memoizing the calls as they are analyzed and using project/extend operations to remove portions of the heap model that cannot be affected by the called procedure.
The focus of this paper is the design of project and extend operations for a graph-based abstract heap model. Our interprocedural analysis introduces a novel method for naming cutpoints which is able to accurately model calls with shared data structures, multiple cutpoints, and recursive calls. In conjunction with our heap model (which tracks all heap properties using local information) the project and extend operations are able to substantially reduce the amount of calling context information that is carried into the analysis of the callee body. Our experimental results indicate that the proposed project/extend operations have a substantial positive effect on the memoization of procedure analysis results and on the reduction of the size of the model that the local analysis must deal with. Additionally, the cutpoint naming technique is (in many cases) able to precisely maintain the relations between the callee local variables and the caller local variables in the immediately preceding call frame, which is critical to modeling programs that perform recursive destructive modifications on lists or trees.
TR-CS-2005-31
A Tableau-Based Decision Procedure for a Fragment of Graph Theory Involving Reachability and Acyclicity
Domenico Cantone and Calogero G. Zarba
We study the decision problem for the language DGRA (directed graphs with reachability and acyclicity), a quantifier-free fragment of graph theory involving the notions of reachability and acyclicity. We prove that the language DGRA is decidable, and that its decidability problem is NP-complete. We do so by showing that the language enjoys a small model property: If a formula is satisfiable, then it has a model whose cardinality is polynomial in the size of the formula. Moreover, we show how the small model property can be used in order to devise a tableau-based decision procedure for DGRA.
TR-CS-2005-20
Explicit Multiregister Measurements for Hidden Subgroup Problems; or, Fourier Sampling Strikes Back
Cristopher Moore and Alexander Russell
We present an explicit measurement in the Fourier basis that solves an important case of the Hidden Subgroup Problem, including the case to which Graph Isomorphism reduces. This entangled measurement uses k=log_2 |G| registers, and each of the 2^k subsets of the registers contributes some information.
TR-CS-2005-04
On the Bias of Traceroute Sampling, or, Power-law Degree Distributions in Regular Graphs
Dimitris Achlioptas, Aaron Clauset, David Kempe, and Cristopher Moore
Understanding the structure of the Internet graph is a crucial step for building accurate network models and designing efficient algorithms for Internet applications. Yet, obtaining its graph structure is a surprisingly difficult task, as edges cannot be explicitly queried. Instead, empirical studies rely on traceroutes to build what are essentially single-source, all-destinations, shortest-path trees. These trees only sample a fraction of the network's edges, and a recent paper by Lakhina et al. found empirically that the resuting sample is intrinsically biased. For instance, the observed degree distribution under traceroute sampling exhibits a power law even when the underlying degree distribution is Poisson.
In this paper, we study the bias of traceroute sampling systematically, and, for a very general class of underlying degree distributions, calculate the likely observed distributions explicitly. To do this, we use a continuous-time realization of the process of exposing the BFS tree of a random graph with a given degree distribution, calculate the expected degree distribution of the tree, and show that it is sharply concentrated. As example applications of our machinery, we show how traceroute sampling finds power-law degree distributions in both delta-regular and Poisson-distributed random graphs. Thus, our work puts the observations of Lakhina et al. on a rigorous footing, and extends them to nearly arbitrary degree distributions.
TR-CS-2005-03
Finding local community structure in networks
Aaron Clauset
Although the inference of global community structure in networks has recently become a topic of great interest in the physics community, all such algorithms require that the graph be completely known. Here, we define both a measure of local community structure and an algorithm that infers the hierarchy of communities that enclose a given vertex by exploring the graph one vertex at a time. This algorithm runs in time O(d*k^2) for general graphs when d is the mean degree and k is the number of vertices to be explored. For graphs where exploring a new vertex is time-consuming, the running time is linear, O(k). We show that on computer-generated graphs this technique compares favorably to algorithms that require global knowledge. We also use this algorithm to extract meaningful local clustering information in the large recommender network of an online retailer and show the existence of mesoscopic structure.
TR-CS-2004-23
Why Mapping the Internet is Hard
Aaron Clauset and Cristopher Moore
Despite great effort spent measuring topological features of large networks like the Internet, it was recently argued that sampling based on taking paths through the network (e.g., traceroutes) introduces a fundamental bias in the observed degree distribution. We examine this bias analytically and experimentally. For classic random graphs with mean degree c, we show analytically that traceroute sampling gives an observed degree distribution P(k) ~ k^{-1} for k < c, even though the underlying degree distribution is Poisson. For graphs whose degree distributions have power-law tails P(k) ~ k^{-alpha}, the accuracy of traceroute sampling is highly sensitive to the population of low-degree vertices. In particular, when the graph has a large excess (i.e., many more edges than vertices), traceroute sampling can significantly misestimate alpha.
TR-CS-2004-22
How much backtracking does it take to color random graphs?
Rigorous results on heavy tails
Haixia Jia and Cristopher Moore
For many backtracking search algorithms, the running time has been found experimentally to have a heavy-tailed distribution, in which it is often much greater than its median. We analyze two natural variants of the Davis-Putnam-Logemann-Loveland (DPLL) algorithm for Graph 3-Coloring on sparse random graphs of average degree c. Let P_c(b) be the probability that DPLL backtracks b times. First, we calculate analytically the probability P_c(0) that these algorithms find a 3-coloring with no backtracking at all, and show that it goes to zero faster than any analytic function as c -> c^* = 3.847... Then we show that even in the ``easy'' regime 1 < c < c^* where P_c(0) > 0 --- including just above the degree c=1 where the giant component first appears --- the expected number of backtrackings is exponentially large with positive probability. To our knowledge this is the first rigorous proof that the running time of a natural backtracking algorithm has a heavy tail for graph coloring.
TR-CS-2004-20
Counting Connected Graphs and Hypergraphs via the Probabilistic Method
Amin Coja-Oghlan, Cristopher Moore and Vishal Sanwalani
It is exponentially unlikely that a sparse random graph or hypergraph is connected, but such graphs occur commonly as the giant components of larger random graphs. This simple observation allows us to estimate the number of connected graphs, and more generally the number of connected d-uniform hypergraphs, on n vertices with m=O(n) edges. We also estimate the probability that a binomial random hypergraph H_d(n,p) is connected, and determine the expected number of edges of H_d(n,p) conditioned on its being connected. This generalizes prior work of Bender, Canfield, and McKay on the number of connected graphs; however, our approach relies on elementary probabilistic methods, extending an approach of O'Connell, rather than using powerful tools from enumerative combinatorics, such as generating functions and complex analysis. We also estimate the probability for each t that, given k=O(n) balls in n bins, every bin is occupied by at least t balls.
TR-CS-2004-19
The Chromatic Number of Random Regular Graphs
Dimitris Achlioptas and Cristopher Moore
Given any integer d >= 3, let k be the smallest integer such that d < 2k log k. We prove that with high probability the chromatic number of a random d-regular graph is k, k+1, or k+2, and that if (2k-1) log k < d < 2k log k then the chromatic number is either k+1 or k+2.
TR-CS-2003-57
Traceroute sampling makes random graphs appear to have power law
degree distributions
Aaron Clauset and Cristopher Moore
The topology of the Internet has typically been measured by sampling traceroutes, which are roughly shortest paths from sources to destinations. The resulting measurements have been used to infer that the Internet's degree distribution is scale-free; however, many of these measurements have relied on sampling traceroutes from a small number of sources. It was recently argued that sampling in this way can introduce a fundamental bias in the degree distribution, for instance, causing random (Erdos-Renyi) graphs to appear to i have power law degree distributions. We explain this phenomenon analytically using differential equations to model the growth of a breadth-first tree in a random graph G(n,p=c/n) of average degree c, and show that sampling from a single source gives an apparent power law degree distribution P(k) ~ 1/k for k < c.
TR-CS-2003-49
A Fast, Parallel Spanning Tree Algorithm for Symmetric Multiprocessors (SMPs)
David A. Bader and Guojing Cong
The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. Many PRAM algorithms can be adapted to SMPs with few modifications. Yet there are few studies that deal with the implementation and performance issues of running PRAM algorithms on SMPs. Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but often have no known efficient parallel implementations. Experimental studies have been conducted on related problems (minimum spanning tree and connected components) using parallel computers, but only achieved reasonable speedup on regular graph topologies that can be implicitly partitioned with good locality features or on very dense graphs with limited numbers of vertices. In this paper we present a new randomized algorithm and implementation with superior performance that \emph{for the first-time} achieves parallel speedup on arbitrary graphs (both regular and irregular topologies) when compared with the best sequential implementation for finding a spanning tree. This new algorithm uses several techniques to give an expected running time that scales linearly with the number $p$ of processors for suitably large inputs (n > p^2). As the spanning tree problem is notoriously hard for any parallel implementation to achieve reasonable speedup, our study may shed new light on implementing PRAM algorithms for shared-memory parallel computers.
The main results of this paper are 1) A new and practical spanning tree algorithm for symmetric multiprocessors that exhibits parallel speedups on graphs with regular and irregular topologies; and 2) An experimental study of parallel spanning tree algorithms that reveals the superior performance of our new approach compared with the previous algorithms.
The source code for these algorithms is freely-available from our web site.
TR-CS-2003-46
Fast Shared-Memory Algorithms for Computing the Minimum
Spanning Forest of Sparse Graphs
David A. Bader and Guojing Cong
Minimum Spanning Tree (MST) is one of the most studied combinatorial problems with practical applications in VLSI layout, wireless communication, and distributed networks, recent problems in biology and medicine such as cancer detection, medical imaging, and proteomics, and national security and bioterrorism such as detecting the spread of toxins through populations in the case of biological/chemical warfare. Most of the previous attempts for improving the speed of MST using parallel computing are too complicated to implement or perform well only on special graphs with regular structure. In this paper we design and implement four parallel MST algorithms (three variations of Boruvka plus our new approach) for arbitrary sparse graphs that for the first time give speedup when compared with the \emph{best} sequential algorithm. In fact, our algorithms also solve the minimum spanning forest problem. We provide an experimental study of our algorithms on symmetric multiprocessors such as IBM's p690/Regatta and Sun's Enterprise servers. Our new implementation achieves good speedups over a wide range of input graphs with regular and irregular structures, including the graphs used by previous parallel MST studies. For example, on an arbitrary random graph with 1M vertices and 20M edges, we have a speedup of 5 using 8 processors. The source code for these algorithms is freely-available from our web site.
TR-CS-2003-37
Chromatic Number in Random Scaled Sector Graphs
Josep Diaz, Vishal Sanwalani, Maria Serna, and Paul Spirakis
In the random sector graph model vertices are placed uniformly at random in the unit square. Each vertex i is assigned a sector S_i uniformly at random, of central angle alpha_i, in a circle of radius r_i (with vertex i as the origin). An arc is present from vertex i to any vertex j if j falls in S_i. We study the chromatic number chi(G), directed clique number omega(G), and undirected clique number mc(G) for random scaled sector graphs with n vertices, where each vertex has the same alpha_i=alpha and the same radius r_i=r_n=sqrt(ln(n)/n). We prove that for values alpha < pi, as n goes to infinity with high probability (w.h.p.), chi(G) and mc(G) are Theta(ln n/ln ln n), while omega(G) is O(1), showing a clear difference with the random geometric graph model. For pi < alpha < 2pi w.h.p., chi(G) and mc(G) are Theta(ln n), being the same for random sector and geometric graphs, while omega(G) is Theta(ln n/ln ln n).
TR-CS-2003-29
The Visibility Graph Among Polygonal Obstacles:
a Comparison of Algorithms
John Kitzinger and Bernard Moret
This work examines differences of four approaches in finding the visibility graph of a polygonal region with obstacles defined by simple polygons. The algorithms are (1) trivial (2) Lee's (3) Overmars and Welzl's (4) Ghosh and Mount's. Each has been implemented and tuned. The experimental comparisons are carried out via time measurements against various testcases. Expected asymptotic time bounds have been verified with crossover points between the implementations identified. The two contributions of this thesis have been (a) adapting Overmars and Welzl's algorithm from working only with line segments to working with general simple polygons (b) implementing method (4). According to one of the authors of method (4), this may be the first working implementation. This may be because the algorithm has always been considered a theoretical algorithm with complicated structures and assumed high time constants. During the course of implementing the algorithm, a major bug in the paper was identified and corrected. Also, as a bit of surprise, the run-time constants for this implementation have shown to be not as high as previously thought. In fact, the implementation out-performs the others on even medium-sized input.
Note: in the future, it may be possible to gain access to the source code of the implementations by writing to bejmk@yahoo.com.
TR-CS-2003-11
Discrete Sensor Placement Problems in Distribution Networks
Tanya Y. Berger-Wolf, William E. Hart, and Jared Saia
We consider the problem of placing sensors in a building or a utility network to monitor the air or water supply. We propose several discrete graph models of the problem and outline possible solutions. We concentrate on minimizing the number of sensors and time to contamination detection. We prove that the problem is NP-hard. We use generalizations and extensions of the various versions of the set cover problem to design approximation algorithms.
TR-CS-2001-32
Almost All Graphs of Degree 4 are 3-Colorable
Dimitris Achlioptas and Cristopher Moore
To appear in Proc. Symposium on the Theory of Computing (STOC) 2002.
The technique of approximating the mean path of Markov chains by differential equations has proved to be a useful tool in analyzing the performance of heuristics on random graph instances. However, only a small family of algorithms can currently be analyzed by this method, due to the need to maintain uniform randomness within the original state space. Here, we significantly expand the range of the differential equation technique, by showing how it can be generalized to handle heuristics that give priority to high- or low- degree vertices. In particular, we focus on 3-coloring and analyze a ``smoothed'' version of the practically successful Brelaz heuristic. This allows to prove that almost all graphs with average degree d are 3-colorable for d <= 4.03, and that almost all 4-regular graphs are 3-colorable. This improves over the previous lower bound of 3.847 on the 3-colorability threshold and gives the first non-trivial result on the colorability of random regular graphs. In fact, our methods can be used to deal with arbitrary sparse degree distributions and in conjunction with general graph algorithms that have a preference for high- or low-degree vertices.
TR-CS-2001-12
Reconstructing Optimal Phylogenetic Trees:
A Challenge in Experimental Algorithmics
Bernard M.E. Moret and Tandy Warnow
Experimental algorithmics has made much progress in the study of basic building blocks for discrete computing: many excellent papers have appeared that compared various priority queues, hash tables, search trees, sorting algorithms, and approximation algorithms for Satisfiability, Binpacking, Graph Coloring, and the Travelling Salesperson problems, just to name a few (for references, see~\cite{moret-shapiro,moret-dimacs}). Its subdiscipline of algorithm engineering likewise has seen great progress in the development of libraries (as exemplified by LEDA), the beginning of a set of software engineering practices for discrete computing, and the refinement of models of computation that take into account today's deep memory hierarchies. We need to extend the scope and benefits of experimental algorithmics and algorithm engineering to applications in the computational sciences.
In this paper, we present on one such application: the reconstruction of evolutionary histories (phylogenies) from molecular data such as DNA sequences. Our presentation is not a survey of past and current work in the area, but rather a discussion of what we see as some of the important challenges in experimental algorithmics that arise from computational phylogenetics. As motivational examples or examples of possible approaches, we briefly discuss two specific uses of algorithm engineering and of experimental algorithmics from our recent research.
TR-CS-2001-03
Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture
David A. Bader,
Ajith K. Illendula, and
Bernard M.E. Moret
The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. In this paper, we develop new techniques for designing a uniform-memory-access shared-memory algorithm from a PRAM algorithm and present the results of an extensive experimental study demonstrating that the resulting programs scale nearly linearly across a significant range of processors (from 1 to 64) and across the entire range of instance sizes tested. This linear speedup with the number of processors is, to our knowledge, the first ever attained in practice for intricate combinatorial problems and stands in sharp contrast to the results obtained on distributed-memory machines (such as mesh- and cluster-based architectures). The example we present in detail here is a graph decomposition algorithm that also requires the computation of a spanning tree; this problem is not only of interest in its own right, but is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and optimal PRAM algorithms, but have no known efficient parallel implementations. Our results thus offer promise for bridging the gap between the theory and practice of shared-memory parallel algorithms.
TR-CS-2000-42
A Linear-Time Algorithm for Computing Inversion Distance Between
Signed Permutations with an Experimental Study
David A. Bader, Bernard M.E. Moret, Mi Yan
Hannenhalli and Pevzner gave the first polynomial-time algorithm for computing the inversion distance between two signed permutations, as part of the larger task of determining the shortest sequence of inversions needed to transform one permutation into the other. Their algorithm (restricted to distance calculation) proceeds in two stages: in the first stage, the overlap graph induced by the permutation is decomposed into connected components, then in the second stage certain graph structures (hurdles and others) are identified. Berman and Hannenhalli avoided the explicit computation of the overlap graph and gave an $O(n\alpha (n))$ algorithm, based on a Union-Find structure, to find its connected components, where $\alpha$ is the inverse Ackerman function. Since for all practical purposes $\alpha(n)$ is a constant no larger than four, this algorithm has been the fastest practical algorithm to date. In this paper, we present a new linear-time algorithm for computing the connected components, which is more efficient than that of Berman and Hannenhalli in both theory and practice. Our algorithm uses only a stack and is very easy to implement. We give the results of computational experiments over a large range of permutation pairs produced through simulated evolution; our experiments show a speed-up by a factor of 2 to 5 in the computation of the connected components and by a factor of 1.3 to 2 in the overall distance computation.