News Archives

Clustering and spatial data mining in computational biology (Video)

December 7, 2007

  • Date: Friday, December 7th, 2007 
  • Time: 1 pm — 2:30 pm 
  • Place: ME 218

Susan Bridges 
Department of Computer Science and Engineering Mississippi State University

Abstract: Data mining algorithms can be applied to a wide variety of problems in computational biology ranging from text mining to genome mining. This talk will cover two such applications: co-clustering of heterogenous data sets and spatial data mining of the genome.

Traditional clustering is typically based on a single feature set. In some domains, several feature sets may be available to represent the same objects, but it may not be easy to compute a useful and effective integrated feature set. We have developed two classes of algorithms to address the problem of combining the results of clustering obtained from multiple related datasets where the datasets represent identical or overlapping sets of objects but use different feature sets. Our methods are shown to yield higher quality clusters than the baseline clustering schemes that include the clustering based on individual feature sets and clustering based on concatenated feature sets.

The vast majority of DNA research has focused on genes. However, eukaryotic genomes are characterized, and often dominated by repetitive, non-genic DNA sequences and experimental evidence has shown that repetitive regions influence expression of nearby genes, alter gene structure, may be instrumental in generation of new genes, and may be a means of rapidly increasing genetic diversity during stress. We present a new method for mining the output of ab initio repeat finders to identify spatial relationships that exist among repetitive elements. We demonstrate that this method can be used to .re-discover. known elements in the genome and to discover novel elements that have not previously been described.

Bio Susan Bridges received a bachelor’s degree in botany from the University of Arkansas in 1969, master’s degree from the University of Mississippi in biology in 1975 and in computer science from Mississippi State University in 1983, and a Ph.D. in computer science from the University of Alabama in Huntsville in 1989. She is currently a Professor in the Department of Computer Science and Engineering at Mississippi State University and Co-Director of the Institute of Digital Biology at Mississippi State University. She is a co-PI on an NSF CyberTrust grant and PI or co-PI on several USDA grants. The focus of her research is the application of data mining in genomes and proteomes and in integrating multiple sources of data.

Watch Colloquium: