This uses a sub-string search, so that "graph" matches both "graph" and "graphical".
This search only searches by keyword. If you'd like, you can also search by researcher.
Found 11 results.
Listing from newest to oldest
TR-CS-2005-02
Quartet methods for phylogeny reconstruction from gene orders
Tao Liu, Jijun Tang, and Bernard M.E. Moret
Phylogenetic reconstruction from gene-rearrangement data has attracted increasing attention from biologists and computer scientists. Methods used in reconstruction include distance-based methods, parsimony methods using sequence-based encodings, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach; however, its exhaustive approach means that it can be applied only to small datasets of fewer than 15 taxa. While we have successfully scaled it up to 1,000 genomes by integrating it with a disk-covering method (DCM-GRAPPA), the recursive decomposition may need many levels of recursion to handle datasets with 1,000 or more genomes. We thus investigated quartet-based approaches, which directly decompose the datasets into subsets of four taxa each; such approaches have been well studied for sequence data, but not for gene-rearrangement data. We give an optimization algorithm for the NP-hard problem of computing optimal trees for each quartet, present a variation of the dyadic method (using heuristics to choose suitable short quartets), and use both in simulation studies. We find that our quartet-based method can handle more genomes than the base version of GRAPPA, thus enabling us to reduce the number of levels of recursion in DCM-GRAPPA, but is more sensitive to the rate of evolution, with error rates rapidly increasing when saturation is approached.
TR-CS-2004-25
Advances in Phylogeny Reconstruction from Gene Order and Content Data
Bernard M.E. Moret and Tandy Warnow
Genomes can be viewed in terms of their gene content and the order in which the genes appear along each chromosome. Evolutionary events that affect the gene order or content are ``rare genomic events" (rarer than events that affect the composition of the nucleotide sequences) and have been advocated by systematists for inferring deep evolutionary histories. This chapter surveys recent developments in the reconstruction of phylogenies from gene order and content, focusing on their performance under various stochastic models of evolution. Because such methods are currently quite restricted in the type of data they can analyze, we also present current research aimed at handling the full range of whole-genome data.
TR-CS-2004-24
Properties of Compatibility and Consensus Sets of Phylogenetic Trees
Tanya Y. Berger-Wolf
One of the fundamental problems in phylogeny reconstruction is combining a set of trees into one ``representative'' tree. There are exist numerous methods for that purpose that combine trees over the same taxa (consensus) or different taxa (supertree) sets. However, many limitations and shortcomings of these methods have been pointed out. In this paper, we state various desirable properties of ``representative'' trees and the impact of considering sets of trees to represent the input. We analyze the sets of compatibility trees and several consensus sets and show which of the properties are satisfied. Specifically, one of the desired properties is the output-polynomial enumeration of the set itself.
TR-CS-2003-53
Fast character optimization in parsimony phylogeny reconstruction
Mi Yan and David A. Bader
The problem of finding a phylogeny with maximum parsimony is one of the main problems in computational biology. While it is impossible to search the possible tree space exhaustively for large data sets, most heuristic approaches try to search in the neighborhood of sub-optimal trees. The speed of computing a score for each tree (e.g. tree length or total number of character changes) is as important as the different tree search strategies. Some techniques include short-cuts that have not been proven to be exact, and an incremental character optimization approach by which the speedup gains depends on data sets and is hardly analyzed in theory. This paper describes an exact and fast algorithm to compute tree length and our algorithm can obtain great speedup for any data sets. We discuss the algorithm for Fitch-parsimony, but it can also be applied to Wagner-parsimony.
TR-CS-2003-34
Online Consensus of Phylogenetic Trees
Tanya Y. Berger-Wolf and Tandy J. Warnow
Phylogeny reconstruction heuristics on biological data need to use some criterion for termination, since we cannot recognize when an optimal tree is produced. A good criterion is the lack of change in the final output of the heuristic. Since these heuristics typically produce many top-scoring trees as their result, these trees are combined using a phylogenetic consensus method. Thus, the lack of change in the consensus of the top-scoring trees is a good heuristic termination criterion. To use this criterion, we need to have algorithms that maintain consensus of a sequence of trees on-line, as the new trees are generated by the heuristic. In this paper, we propose and analyze two algorithms for the on-line computation of strict and majority consensus trees. We show that the on-line strict consensus algorithm is time and space-optimal. We propose two implementations of the on-line majority consensus algorithm. One implementation is space-optimal and O(lg n) competitive, while the other is time-optimal. To the best of our knowledge, these are the first on-line algorithms for computing consensus of phylogenetic trees.
TR-CS-2002-04
Inversion Medians Generally Outperform Breakpoint Medians in Phylogeny
Reconstruction from Gene-Order Data
Bernard M.E. Moret, Adam C. Siepel, Jijun Tang, and Tao Liu
Phylogeny reconstruction from gene-order data has attracted much attention over the last few years. The two software packages used for that purpose, BPAnalysis and GRAPPA, both use so-called breakpoint medians in their computations. Breakpoint distance, however, is a heuristic measure of evolution, not directly based on any model of genome rearrangement. We conjectured that phylogeny reconstructions could be improved by using inversion medians, which minimize evolutionary distance under an inversions-only model of genome rearrangement. Recent algorithmic developments have made it possible to compute inversion medians for problems of realistic size.
Our experimental studies unequivocally show that inversion medians are strongly preferable to breakpoint medians in the context of phylogenetic reconstruction from gene-order data. Improvements are most pronounced in the reconstruction of ancestral genomes, but are also evident in the topological accuracy of the reconstruction as well as, surprisingly, in the overall running time. Improvements are strongest for small average distances along tree edges and for evolutionary scenarios with a preponderance of inversion events, but occur in all cases, including evolutionary scenarios with high proportions of transpositions.
TR-CS-2001-30
Improved bounding and searching for phylogeny reconstruction from
gene-order data
Bernard M.E. Moret and Jijun Tang
A phylogeny is the evolutionary history of a group of organisms; systematists (and other biologists) attempt to reconstruct this history from various forms of data about contemporary organisms. Phylogeny reconstruction is a crucial step in the understanding of evolution as well as an important tool in biological, pharmaceutical, and medical research. Phylogeny reconstruction from molecular data is very difficult: almost all optimization models give rise to NP-hard (and thus computationally intractable) problems. Yet approximations must be of very high quality in order to avoid outright biological nonsense. Thus many biologists have been willing to run farms of processors for many months in order to analyze just one dataset.
Phylogenies derived from gene order data may prove crucial in answering some fundamental open questions in biomolecular evolution. Yet very few techniques are available for such phylogenetic reconstructions---and all involved NP-hard optimization problems. One method is breakpoint analysis, developed by Blanchette and Sankoff for solving the "breakpoint phylogeny." We had reimplemented their approach to produce the GRAPPA software suite, which ran three orders of magnitude faster thanks to the application of algorithm engineering techniques and, in its parallel version, demonstrated a one-million-fold speedup on a dataset of flowering plants.
We report here on improvements to the bounding and searching techniques used in GRAPPA that have brought on an additional speedup by another two orders of magnitude. Such speedups, while modest from a theoretical standpoint (the exponential behavior is simply pushed back to slightly larger instance sizes), nevertheless have important practical consequences, as they allow research biologists to analyze in a few hours or days data that could not have been analyzed at all just one year ago. Our searching method is of independent interest, as it is applicable to any optimization problem of modest size where lower bounds are expected to be tight.
TR-CS-2001-18
Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data:
An Empirical Study
Li-San Wang, Robert K. Jansen, Bernard M.E. Moret, Linda A. Raubeson,
and Tandy Warnow
Evolution operates on whole genomes through mutations that change the order and strandedness of genes within the genomes. Thus analyses of gene order data present new opportunities for discoveries about deep evolutionary events, provided that sufficiently accurate methods can be developed to reconstruct evolutionary trees. In this paper we present a new method of character coding for parsimony-based analysis of genomic rearrangements called MPBE-2, and a new parsimony-based method which we call MPME (based on an encoding of Bryant), both variants of the MPBE method. We then conduct computer simulations to compare this class of methods to distance-based methods (NJ under various distance measures) and other parsimony approaches (MPBE and MPME) for phylogeny reconstruction based on genome rearrangement data. Our empirical results show that parsimony approaches generally have better accuracy than distance-based methods.
TR-CS-2001-10
Industrial Applications of High-Performance Computing for
Phylogeny Reconstruction
David A. Bader, Bernard M.E. Moret, and Lisa Vawter
Phylogenies (that is, tree-of-life relationships) derived from gene order data may prove crucial in answering some fundamental open questions in biomolecular evolution. Real-world interest is strong in determining these relationships. For example, pharmaceutical companies may use phylogeny reconstruction in drug discovery for finding plants with similar gene production. Health organizations study the evolution and spread of viruses such as HIV to gain understanding of future outbreaks. And governments are interested in aiding the production of foodstuffs like rice, wheat, and corn, by understanding the genetic code. Yet very few techniques are available for such phylogenetic reconstructions. Appropriate tools for analyzing such data may help resolve some difficult phylogenetic reconstruction problems; indeed, this new source of data has been embraced by many biologists in their phylogenetic work.
With the rapid accumulation of whole genome sequences for a wide diversity of taxa, phylogenetic reconstruction based on changes in gene order and gene content is showing promise, particularly for resolving deep (i.e., old) branches. However, reconstruction from gene-order data is even more computationally intensive than reconstruction from sequence data, particularly in groups with large numbers of genes and highly rearranged genomes. We have developed a software suite, GRAPPA, that extends the breakpoint analysis (BPAnalysis) method of Sankoff and Blanchette while running much faster: in a recent analysis of a collection of chloroplast data for species of Campanulaceae on a 512-processor Linux supercluster with Myrinet, we achieved a one-million-fold speedup over BPAnalysis. GRAPPA currently can use either breakpoint or inversion distance (computed exactly) for its computation and runs on single-processor machines as well as parallel and high-performance computers.
TR-CS-2001-05
High-Performance Algorithm Engineering for Computational Phylogenetics
Bernard M.E. Moret, David A. Bader, and Tandy Warnow
Phylogeny reconstruction from molecular data poses complex optimization problems: almost all optimization models are NP-hard and thus computationally intractable. Yet approximations must be of very high quality in order to avoid outright biological nonsense. Thus many biologists have been willing to run farms of processors for many months in order to analyze just one dataset. High-performance algorithm engineering offers a battery of tools that can reduce, sometimes spectacularly, the running time of existing phylogenetic algorithms. We present an overview of algorithm engineering techniques, illustrating them with an application to the ``breakpoint analysis'' method of Sankoff et al. which resulted in the GRAPPA software suite. GRAPPA demonstrated a million-fold speedup in running time (on a variety of real and simulated datasets) over the original implementation. We show how algorithmic engineering techniques are directly applicable to a large variety of challenging combinatorial problems in computational biology.
TR-CS-2000-30
Performance Study of Phylogenetic Methods:
(Unweighted) Quartet Methods and Neighbor-Joining
Katherine St. John, Tandy Warnow, Bernard M.E. Moret, and Lisa Vawter
We present the results of a large-scale experimental study of quartet-based methods (quartet cleaning and puzzling) for phylogeny reconstruction. Our experiments include a large range of problem sizes and of evolutionary rates, and were carefully designed to yield statistically robust results despite the size of the sample space. We measure outcomes in terms of numbers of edges of the true tree correctly inferred by each method (true positives). Our results indicate that quartet-based methods are much less accurate than the simple and efficient method of neighbor-joining, particularly for data composed of short- to medium-length sequences. We support our experimental findings by theoretical results that strongly suggest that quartet-cleaning methods are unlikely to yield accurate trees with less than exponentially long sequences. We conclude that a proposed reconstruction method should first be compared to the neighbor-joining method and further studied only if it offers a demonstrable practical advantage.