UNM Computer Science

Colloquia



Clustering and spatial data mining in computational biology

Date: Friday, December 7th, 2007
Time: 1 pm — 2:30 pm
Place: ME 218

Susan Bridges
Department of Computer Science and Engineering
Mississippi State University

Abstract:
Data mining algorithms can be applied to a wide variety of problems in computational biology ranging from text mining to genome mining. This talk will cover two such applications: co-clustering of heterogenous data sets and spatial data mining of the genome.

Traditional clustering is typically based on a single feature set. In some domains, several feature sets may be available to represent the same objects, but it may not be easy to compute a useful and effective integrated feature set. We have developed two classes of algorithms to address the problem of combining the results of clustering obtained from multiple related datasets where the datasets represent identical or overlapping sets of objects but use different feature sets. Our methods are shown to yield higher quality clusters than the baseline clustering schemes that include the clustering based on individual feature sets and clustering based on concatenated feature sets.

The vast majority of DNA research has focused on genes. However, eukaryotic genomes are characterized, and often dominated by repetitive, non-genic DNA sequences and experimental evidence has shown that repetitive regions influence expression of nearby genes, alter gene structure, may be instrumental in generation of new genes, and may be a means of rapidly increasing genetic diversity during stress. We present a new method for mining the output of ab initio repeat finders to identify spatial relationships that exist among repetitive elements. We demonstrate that this method can be used to .re-discover. known elements in the genome and to discover novel elements that have not previously been described.

Bio
Susan Bridges received a bachelor's degree in botany from the University of Arkansas in 1969, master's degree from the University of Mississippi in biology in 1975 and in computer science from Mississippi State University in 1983, and a Ph.D. in computer science from the University of Alabama in Huntsville in 1989. She is currently a Professor in the Department of Computer Science and Engineering at Mississippi State University and Co-Director of the Institute of Digital Biology at Mississippi State University. She is a co-PI on an NSF CyberTrust grant and PI or co-PI on several USDA grants. The focus of her research is the application of data mining in genomes and proteomes and in integrating multiple sources of data.

Zombies, ETs and Other Encounters with Dynamic Graph Algorithms

Date: Friday, November 16th, 2007
Time: 1 pm — 2:30 pm
Place: ME 218

Valerie King
University of Victoria

Abstract:
Given any graph problem, we may ask if it's possible to maintain a solution as an input graph changes incrementally, in time faster than recomputing from scratch for each update. The goal of a dynamic graph algorithm is to efficiently implement update and query operations in reasonable space.

This talk will be a whirlwind tour of fundamental ideas in dynamic graph algorithms and lower bounds, focusing mainly on the problems of connectivity and minimum spanning tree, transitive closure and shortest paths.

Bio:
Valerie King is Professor in the Computer Science Department at the University of Victoria. She received her A.B. in mathematics from Princeton, her J.D. and Ph.D. from Boalt Hall and the Computer Science Dept. at UC Berkeley. She held post-doctoral positions at Princeton University and the University of Toronto. She has been a research scientist at NECI in Princeton, and at Compaq SRC and HP Labs. She was a visiting researcher at Microsoft Research Labs in Silicon Valley, a visiting professor at Hebrew University and the University of Copenhagen and a visiting scholar at UC Berkeley. She has been a member of the California Bar since 1981. She has done extensive work in dynamic graph algorithms, and is currently working on algorithms with applications to networks, computational biology, and distributed computing, as well as sociology of the web.

Computational insights into the social life of zebras

Date: Friday, November 9th, 2007
Time: 1 pm — 2:30 pm
Place: ME 218

Tanya Berger-Wolf
University of Illinois at Chicago

Abstract:
Computation has fundamentally changed the way we study nature. Recent breakthroughs in data collection technology, such as GPS and other mobile sensors, are giving biologists access to data about wild populations that are orders of magnitude richer than any previously collected. Such data offer the promise of answering some of the big ecological questions about animal populations:

Unfortunately, in this domain, our ability to analyze data lags substantially behind our ability to collect it. In particular, interactions among individuals are often modeled as social networks where nodes represent individuals and an edge exists if the corresponding individuals have interacted during the observation period. The model is essentially static in that the interactions are aggregated over time and all information about the time and ordering of social interactions is discarded. We show that such traditional social network analysis methods may result in incorrect conclusions on dynamic data about the structure of interactions and the processes that spread over those interactions. We have extended computational methods for social network analysis to explicitly address the dynamic nature of interactions among individuals. We have developed techniques for identifying persistent communities, influential individuals, and extracting patterns of interactions in dynamic social networks. We will present our approach and demonstrate its applicability by analyzing interactions among zebra populations and identifying how the structure of interactions relates to demographic status.

Bio:
Dr. Tanya Berger-Wolf is an assistant professor at the Department of Computer Science at the University of Illinois at Chicago. Her research interests are in applications of combinatorial optimization analysis and algorithm design techniques to problems in population biology of plants, animals, and humans, from genetics to social interactions.

Dr. Berger-Wolf has received her B.Sc. in Computer Science and Mathematics from Hebrew University (Jerusalem, Israel) in 1995 and her Ph.D. in Computer Science from University of Illinois at Urbana-Champaign in 2002. She has spent two years as a postdoctoral fellow at the University of New Mexico working in computational phylogenetics and a year at the Center for Discrete Mathematics and Theoretical Computer Science (DIMACS) doing research in computational epidemiology.

The End Of Computers As We Know Them?

Date: Friday, October 26th, 2007
Time: 1 pm — 2:30 pm
Place: ME 218

Christof Teuscher
Los Alamos National Laboratory

Abstract:
Since the beginning of modern computer science some sixty years ago, we are building computers in more or less the same way. Silicon electronics serves as a physical substrate, the von Neumann architecture provides a computer design model, while the abstract Turing machine concept supports the theoretical foundations. Alternative computing paradigms and machines always played a rather marginal role in the past, essentially because there was little or no need to go beyond the horizon of silicon electronics. That is changing: in recent years, previously unseen computing substrates have seen the light, for example because of advances in synthetic biology, nanotechnology, material science, and neurobiology. A grand challenge in computer science consists in developing architectures, design methodologies, formal frameworks, and tools that allow to reliably compute and efficiently solve problems with such alternative devices. In this talk, I will first review some exemplary future and emerging computing devices and highlight the particular challenges that arise for performing computations with them. I will then delineate potential solutions on how these challenges might be addressed. Self-assembled nano-scale electronics will serve as a simple showcase. Without disruptive new technologies, it is expected that the ever-increasing computing performance and storage capacity achieved with existing technologies will eventually reach a plateau. Last, I will illustrate future challenges that need to be addressed in order to keep computer science going at the same pace.

Bio:
Christof Teuscher currently holds a Technical Staff Member position at Los Alamos National Laboratory (LANL). He obtained his M.Sc. and Ph.D. degree in computer science from the Swiss Federal Institute of Technology in Lausanne (EPFL) in 2000 and 2004 respectively. In 2004 he became a postdoctoral researcher at the University of California, San Diego (UCSD) and in 2005 a distinguished Director's Postdoctoral Fellow at the Los Alamos National Laboratory. His main research interests include biologically-inspired computing, emerging computing architectures and paradigms, complex & adaptive systems, and cognitive science. Teuscher has received several prestigious awards and fellowships. For more information visit: http://www.teuscher.ch/christof

Predictions and Machine Learning: A personal viewpoint

Date: Friday, October 19th, 2007
Time: 1 pm — 2:30 pm
Place: ME 218

Tudor I. Oprea
Professor of Biochemistry and Molecular Biology and Chief of the Division of Biocomputing
UNM School of Medicine

Abstract:
From the perspective of molecular bioactivity, machine learning is a widely used tool that streamlines the R&D process and enables unexpected discoveries. This talk will first showcase the use of machine learning in small molecule discovery, at UNM Biocomputing. Then we discuss the impact of the "unknown unknown" in our ability to perform accurate predictions, which is inherently limited by the past. The Black Swan, a metaphor for highly improbable, high impact events, warns us to understand the limits of predictions and expect the "unknown unknowns" everywhere, from war to science and industry. We further argue that the structure of the pharmaceutical and flavor industries are driven by, and impacted by both positive and negative Black Swans.

Bio:
Tudor I. Oprea has an MD and a PhD from the Universityof Medicine and Pharmacy in Timisoara, Romania. His past appointments were with the University of Utrecht, the Netherlands, Washington University in St.Louis, MO, and Los Alamos National Lab, before spending 6 years at AstraZeneca, Sweden. Since 2002, Tudor is Professor of Biochemistry and Molecular Biology, and Chief of the Division of Biocomputing at the UNM School of Medicine. Tudor works in the areas of QSAR, lead identification and optimization, virtual screening and drug discovery. He has co-authored two books, seventeen book chapters, more than sixty peer-reviewed papers and several patent disclosures. He received the 2002 Hansch Award from the QSAR and Molecular Modeling Society and serves as the Chair of the Cheminformatics and QSAR Society. He is the founder and CEO of Sunset Molecular Discovery LLC.

Fusion of multi-task and multi-modal brain imaging and genetic data: An integrated approach and several examples

Date: Friday, October 5th, 2007
Time: 1 pm — 2:30 pm
Place: ME 218

Vince Calhoun
Department of Electrical and Computer Engineering
Univeristy of New Mexico

Abstract:
Brain imaging techniques available today provide multiple sources of complementary information. Most research studies do collect several types of brain images on the same individuals. However the vast majority of studies analyze each data set separately, and do not attempt to directly examine the inter-relationships between the different data types. Such approaches are not straightforward due to the high dimensionality of the data (tens of thousands of voxels or timepoints or genetic factors). In this talk I will present an approach for jointly analyzing brain imaging data using independent component analysis, and present examples from structural and functional MRI, Event-related potential data, and genetic data.  The examples will illustrate situations in which we learn something new by combining multiple data types which would not have been revealed in a separate analysis.

Bio:
Vince Calhoun received a bachelor's degree in Electrical Engineering from the University of Kansas, Lawrence, Kansas, in 1991, master's degrees in Biomedical Engineering and Information Systems from Johns Hopkins University, Baltimore, in 1993 and 1996, respectively, and the Ph.D. degree in electrical engineering from the University of Maryland Baltimore County, Baltimore, in 2002. He is currently the Director of Image Analysis and MR Research at the MIND Institute and an Associate Professor in the Electrical and Computer Engineering Dept. at the University of New Mexico. He is the author of 69 full journal articles, over 30 technical reports, and over 110 abstracts and conference proceedings. Much of his career has been spent on the development of data driven approaches for the analysis of functional magnetic resonance imaging (fMRI) data.  He has three R01 grants from the National Institute of Health on the development of various data driven methods and multimodal data fusion approaches applied to mental illness. In addition he has an NSF grant for developing methods to analyze complex-valued imaging data.

Computer Science Town Hall Meeting

Date: Friday, September 21st, 2007
Time: 1 pm — 2:30 pm
Place: ME 218

Stephanie Forrest
Department Chair
Department of Computer Science

Dept. Chair, Stephanie Forrest, will hold a "town hall" meeting with all graduate students (interested undergraduates are invited as well). This is your chance to learn more about what's going on in the department, ask questions, make suggestions, brainstorm about new ideas, and even complain about things that aren't working well for you. The meeting is intended to be highly interactive, but to kick things off Prof. Forrest will give a short overview of department activities and opportunities for students this year.

Network Scaling in Complex Systems: What happens when organisms, cities and computer chips get bigger?

Date: Friday, September 21st, 2007
Time: 1 pm — 2:30 pm
Place: ME 218

Melanie Moses
Department of Computer Science, UNM

Abstract:
Networks distribute energy, materials and information to the components of a variety of natural and human-engineered systems, including organisms, cities, and computer chips. Distribution networks enable integrated and coordinated functioning of these complex systems, and they also constrain their design. The behavior of these systems emerges from the actions of the components and the ways in which those components are networked together. The systems can vary enormously in size: a whale has 1017 times more cells than a bacterium; the population of Los Angeles is 4 orders of magnitude larger than Hatch, NM; a Dual Core Itanium has 4 orders of magnitude more transistors than the 286. Biologists have observed a set of nonlinear relationships between the size of organisms and their behaviors (e.g. how much they eat, how long they live, how dense their populations are). Here I ask whether such regularities exist between size and behavior of other complex systems. In other words, can bacteria and whales tell us something about traffic in Hatch vs. Los Angeles or the power consumed by a 286 vs. an Itanium. I examine whether these apparently different systems face similar network scaling constraints. I also discuss how different scaling behaviors result from differences in the degree of centralization in networks, constraints on the physical area or volume of networks, and physical properties of components.

Bio:
Melanie received her Ph.D. in Biology from the University of New Mexico in 2005. She received her B.S. from Stanford University in Symbolic Systems, an interdisciplinary program in cognition and computation. Melanie is currently an Assistant Professor in the Department of Computer Science at the University of New Mexico where she continues her interdisciplinary work in Computer Science and Biology. Her research interests are in how properties of complex systems change as they increase in size. Her work includes mathematical models of organism growth and reproduction, models of cardiovascular networks and the immune system, and the application of biological network scaling models to engineered networks.

Randomized Rounding for Sensor Placement Problems

Date: Friday, September 14th, 2007
Time: 1 pm — 2:30 pm
Place: ME 218

Cynthia Phillips
Sandia National Laboratories

Abstract:
The recent emphasis on homeland security in the U. S. has led to a number of new applications involving sensor placement in physical networks. We will describe some of these sensor placement problems including sensor placement in municipal water networks to minimize health effects from contamination and sensor placement for intruder detection in transportation networks or buildings. We have addressed these problems using (parallel) integer programming. In this talk we present a parallelizable heuristic method for finding approximate solutions to general sensor placement problems.

An Integer program (IP) is the optimization (maximization or minimization) of a linear function subject to linear constraints and integrality constraints on some or all of the variables. IPs naturally model NP-hard combinatorial optimization problems. Thus integer programming is itself NP-complete, but one can frequently solve instances in practice using branch and bound via commercial or research solvers. Removing the integrality constraints gives the linear-programming relaxation of an integer program. This is tractible both theoretically and in practice.

In this talk, we consider finding heuristic solutions to sensor placement IP problems using randomized rounding. This technique was introduced by Raghavan and Thompson. In its simplest form, one selects binary variables randomly independently using the LP-relaxation value as a probability distribution. For an arbitrary integer program, this will rarely find a solution that satisfies the linear constraints. However, our sensor-placement problems have only one hard constraint: a bound on the number of sensors. Other constraints compute the objective value.

We will present a new randomized rounding strategy that satisfies a single cardinality constraint. It samples uniformly from the set of "lucky" randomized roundings that happen to select exactly the correct number of variables. The algorithm requires O(k(n-k)) deterministic serial preprocessing time. We can then select a sensor placement with a small number of random numbers and O(n + k) additional time.

We compare our algorithm to a randomized rounding algorithm due to Doerr. His algorithm is also fast, meets the cardinality constraint, and has sufficient independence to allow Chernoff bound analysis. However, its implementation presents some challenging numerical issues. Our algorithm has stronger independence and a fundamentally different probability distribution for individual variables.

(Joint work with Jonathan Berry, Sandia National Laboratories)

Bio:
Cindy Phillips is a Distinguished Member of the Technical Staff in the Algorithms and Discrete Math Department at Sandia National Laboratories. She received a B.A. in applied mathematics from Harvard University in 1983 and a PhD in computer science from MIT in 1990. She joined Sandia National Laboratories in 1990 where she has conducted research in combinatorial optimization, algorithm design and analysis, and parallel computation with applications to scheduling, network and infrastructure surety, integer programming, graph algorithms, vehicle routing, computational biology, and experimental algorithmics. She is one of three main developers of the PICO massively-parallel integer programming code. She was a member of the Compute Process Allocator (CPA) team that won an R&D 100 award in 2006.

ConceptDoppler: A Weather Tracker for Internet Censorship

Date: Friday, September 7th, 2007
Time: 1 pm — 2:30 pm
Place: ME 218

Jed Crandall
Department of Computer Science, UNM

Abstract:
Imagine you want to remove the history of the Wounded Knee massacre from the Library of Congress, two ways to do this are: 1) remove "Bury my Heart at Wounded Knee" and a few other selected books; or 2) remove every book in the entire library that contains the word "massacre" in its text. If you view the Internet as one large library, Chinese Internet censorship based on keyword filtering is the equivalent of the latter. In this talk I'll present results from a paper we recently published about China's keyword-based Internet censorship mechanism.

We present two sets of results: 1) Internet measurements of keyword filtering by the Great "Firewall of China (GFC); and 2) initial results of using latent semantic analysis as an efficient way to reproduce a blacklist of censored words via probing.

Our Internet measurements suggest that the GFC's keyword filtering is more a panopticon than a firewall, i.e., it need not block every illicit word, but only enough to promote self-censorship. China's largest ISP, ChinaNET, performed 83.3% of all filtering of our probes, and 99.1% of all filtering that occurred at the first hop past the Chinese border. Filtering occurred beyond the third hop for 11.8% of our probes, and there were sometimes as many as 13 hops past the border to a filtering router. Approximately 28.3% of the Chinese hosts we sent probes to were reachable along paths that were not filtered at all. While more tests are needed to provide a definitive picture of the GFC's implementation, our results disprove the notion that GFC keyword filtering is a firewall strictly at the border of China's Internet.

While evading a firewall a single time defeats its purpose, it would be necessary to evade a panopticon almost every time. Thus, in lieu of evasion, we propose ConceptDoppler, an architecture for maintaining a censorship "weather report about what keywords are filtered over time. Probing with potentially filtered keywords is arduous due to the GFC's complexity and can be invasive if not done efficiently. Just as an understanding of the mixing of gases preceded effective weather reporting, understanding of the relationship between keywords and concepts is essential for tracking Internet censorship. We show that LSA can effectively pare down a corpus of text and cluster filtered keywords for efficient probing, present 122 keywords we discovered by probing, and underscore the need for tracking and studying censorship blacklists by discovering some surprising blacklisted keywords such as (in Chinese) conversion rate, Mein Kampf, and International geological scientific federation (Beijing).

(Joint work with Daniel Zinn, Michael Byrd, Earl Barr, UC Davis and Rich East)

Bio:
Jed received his Ph.D. from the University of California at Davis and his B.S. from Embry-Riddle Aeronautical University in Prescott, Arizona.
He is currenly an Assistant Professor with the Department of Computer Science, Univeristy of New Mexico.

Jed's research area is computer security and privacy, with a background in computer architecture, ranging from architectural support for systems security to the capturing and analyzing of Internet worms. More recent work includes behavior-based analysis of malicious code, including using a new technique called temporal search to detect timebombs within computer viruses based on their use of hardware timers. He also studies the technical issues of government censorship.

Automated Deduction and Its Application to Mathematics

Date: Friday, August 31, 2007
Time: 1 pm — 2:30 pm
Place: ME 218

Robert Veroff
Department of Computer Science, UNM

Abstract:
One of the objectives of automated deduction is to develop tools that use mathematical logic and deduction to solve, or help people solve, problems coming from a wide variety of application domains. Such tools are being used for research in mathematics and have led to the solution of numerous open questions. In this talk, I will give a brief introduction to the field and will describe some of the activities of our research group.

Bio:
Bob Veroff is professor emeritus in the UNM Computer Science Department.