UNM Computer Science

Colloquia Archive - Spring 2006



Multi-variate Volume Visualization

Date: Tuesday, May 2, 2006
Time: 11:00 am — 12:15 pm
Place: Woodward 149

Joe Kniss
Computer Science Department
University of Utah

Scientific visualization is a discipline that joins data analysis and human visual perception. Visualization addresses a need for high performance, interactive data exploration of ever increasing size and complexity. My work focuses on volume visualization techniques. Volume visualization deals with the discovery and display of important features embedded in three-dimensional data. Volume visualization users are increasingly in need of techniques for assessing quantitative uncertainty and error in the images produced. Statistical segmentation algorithms can compute these quantitative results, yet volume rendering tools typically produce only qualitative imagery via transfer function-based (look-up table) classification. My recent work introduces a visualization technique that allows users to interactively explore uncertainty, risk, and probabilistic decision boundaries. This approach makes it possible to directly visualize the combined "fuzzy" classification results from multiple segmentations by encoding this data in a unified probabilistic data space. This unified space, derived from the combination of scalar volumes from numerous segmentations, is represented using a new graph-based dimensionality reduction scheme. The scheme both dramatically reduces the dataset size and is suitable for efficient, high quality, quantitative visualization. More importantly, this scheme emphasizes the relationship between multi-modal or multivariate image data and the intrinsic nature of "features" encoded in the image. This knowledge can be leveraged to produce high quality visualizations, robust feature detection algorithms, and improved representations of multi-dimensional image data.

This talk will cover a basic introduction to the volume visualization process, graphical methods for rendering and lighting, algorithmic considerations with respect to efficiency and computational hardware, as well as new theoretical results based on this research.

Learning Hierarchical Models from Graph Data

Date: Thursday, April 27, 2006
Time: 11:00 am — 12:15 pm
Place: Woodward 149

Aaron Clauset
Department of Computer Science
University of New Mexico

One property that has received comparatively little attention in the recent renaissance of network analysis is that of hierarchy, i.e., the observation that networks often have a fractal-like structure in which vertices cluster together in groups that then join to form group of groups, and so forth, at all levels of organization in the network. Here, we give a precise and mathematically rigorous definition of hierarchy, and describe a statistically principled way to learn the set of hierarchical features that most plausibly explain a particular real-world network. We briefly describe the advantages this approach has for automating the interpretation of network data, which include annotating edges, vertices and local structures with some notation of how surprising they are, and the generation of generic null-models for further hypothesis testing.

This work is joint with Cristopher Moore (UNM) and Mark Newman (UMich).

Collaboration: Getting Help without Giving Away the Store

Date: Thursday, April 20, 2006
Time: 11:00 am — 12:15 pm
Place: Woodward 149

Charles Valauskas
Baniak, Pine & Gannon
Chicago, IL

Who would have imagined that the price for getting some help on a project would be giving away some or all of the rights that come out of the project? This could happen. This discussion will address the issue of why it is important to think about the circumstances under which you obtain help on a project before you solicit the help. The discussion will use some real life examples to illustrate the key points. Suggestions will be provided to help avoid unfortunate copyright consequences from your collaborative efforts. I will provide a brief presentation on these important copyright issues.

Bio of Chuck Valauskas

An introduction to component-based programing through the Think project.

Date: Tuesday, April 18, 2006
Time: 11:00 am — 12:15 pm
Place: Woodward 149

Dr. Jean-Charles Tournier
Department of Computer Science
University of New Mexico

In the context of software engineering, component-based programming becomes more and more popular. Indeed, during the last decade, many different component models appeared for various application domains such as modelization (e.g. UML2.0, SDL), distributed applications (e.g. J2EE, CCM, .NET), real-time systems (e.g. PECOS, VEST) or embedded systems (e.g. PECOS, Think). The huge number of different component models make difficult to understand the concept of component.
The objective of this talk is to give an introduction to component-based programming. A first part will present motivations and concepts of component, while a second part will identify the main characteristics of the most popular models. Finally, a last part will present the Think component-based operating system in order to illustrate the concepts previously presented.

Bio:
Dr. Jean-Charles Tournier is a post-doc at CS department of University of New Mexico in the SSL group since October 2005. He received his PhD from the National Institute of Applied Sciences (France) in 2005. During his PhD, Jean-Charles has been involved in the definition of an abstract component model called Fractal. In the same time, he was part of the team who implemented this component model. This implementation, called Think, deals with operating systems development: it provides a set of tools and a set of components to compose an application-specific operating system.

Computational Challenges in the Search for Gravitational Waves

Date: Tuesday, April 4, 2006
Time: 11:00 am — 12:15 pm
Place: Woodward 149

Dr. Kipp Cannon
Department of Physics, University of Wisconsin, Milwaukee

Although nearly 100 years have passed since they were first described theoretically, gravitational waves, one of the many predictions of Albert Einstein's general theory of relativity, have never been observed directly. LIGO (the Laser Interferometer Gravitational Wave Observatory), is an ambitious project to search for and detect these ripples in the curvature of spacetime.

The three LIGO instruments produce 15,000 channels of time-series data. This, our fifth science run, will produce nearly a petabyte of data spanning tens of millions of files in a half dozen compute facilities.

In this talk, I will give an overview of the search for gravitational waves, and then discuss some of the interesting computing challenges the effort to analyze LIGO data creates.

Phylogenetic Estimation for Complex Evolutionary Processes

Date: Thursday, March 30, 2006
Time: 11:00 am — 12:15 pm
Place: Woodward 149

Li-San Wang (UNM Faculty Candidate)
Department of Biology, University of Pennsylvania

Stochastic models of sequence evolution, since their introduction in the 1960s, have inspired the development of numerous computational and statistical methods for phylogeny reconstruction which have been widely successful in reconstructing the evolutionary history of genes and species. However, standard evolutionary models have two essential features, both of which are known to fail for a wide range of real biological data: (1) the domain of mutation is a concatenation of multiple independently distributed sites, each following a simple, identical stochastic process, and (2) the evolutionary history is a branching process (tree).

This talk is an overview of my research on complex evolutionary processes -- processes that lack either of the two features of standard models. In both cases, new stochastical models need to be developed. Moreover, the inference of evolutionary histories under these models is much harder - in some cases simply computationally more intense, but in other cases posing significant and new algorithmic challenges.

I will cover two such processes: the process of gene order evolution. and the process of horizontal gene transfer. For each process, I will formulate the estimation problems, identify computational and statistical issues, and present our current results and future research directions.

Towards Novel Computing Paradigms: Computation with Irregular and Imperfect Cellular Assemblies

Date: Tuesday, March 28, 2006
Time: 11:00 am — 12:15 pm
Place: Woodward 149

Christof Teuscher (UNM Faculty Candidate)
Los Alamos National Laboratory

Novel materials and fabrication technologies bear unique opportunities for self-assembling tomorrow's multi-billion component computing machines in a largely random manner, which would lower fabrication costs significantly compared to today's definite ad hoc assemblies. However, we do not possess the formalisms, methodologies, and tools do deal with such unconventional machines. This challenge—at the interface between biology, computer science, material science, and physics—needs to be addressed in a very interdisciplinary way, and, if successful, would likely allow for a quantum leap in performance of computing machines.

In this talk, I will first present an overview on some biologically inspired computing architectures, both with a regular and an irregular basic structure. Using cellular automata and random boolean networks as a showcase, I will illustrate that irregular assemblies have major advantages over regular ones for solving certain "global" tasks. I will further describe the properties of a physically plausible small-world interconnect fabric that is inspired by modern network on chip paradigms. The framework's key parameters, such as the connectivity, the number of switch blocks, the number of virtual channels, the routing strategy, and the distribution of long- and short-range connections shall be varied while measuring the network's transport characteristics and robustness against failures. The results show that irregular interconnect fabrics have major advantages in terms of performance and robustness over purely regular fabrics.

The last part of the talk shall be dedicated to future work and challenges in realizing higher-level adaptive and robust systems on top of novel computing substrates.

BIOGRAPHY:
Christof Teuscher obtained his M.Sc. and Ph.D. degree in computer science from the Swiss Federal Institute of Technology in Lausanne (EPFL) in 2000 and 2004 respectively. He was a postdoctoral researcher at the University of California in San Diego (UCSD) from 2004 to 2005 and is currently a Director's Postdoctoral Fellow at the Los Alamos National Laboratory. His main research interests include biologically-inspired computing, novel and unconventional computing architectures and paradigms, and complex & adaptive systems. Teuscher has received several awards and fellowships. For more information visit http://www.teuscher.ch/christof.

Motion Planning for a Class of Planar Closed-Chain Manipulators

Date: Thursday, March 23, 2006
Time: 11:00 am — 12:15 pm
Place: Woodward 149

Guanfeng Liu (UNM Faculty Candidate)
Department of Computer Science,
Stanford University

We study the motion planning problem for planar star-shaped manipulators. These manipulators are formed by joining k "legs" to a common point (like the thorax of an insect) and then fixing the "feet" to the ground. The result is a planar parallel manipulator with k-1 independent closed loops. A topological analysis is used to understand the global structure of the configuration space so that planning problem can be solved exactly. The worst-case complexity of our algorithm is O(k3N3), where N is the maximum number of links in a leg. Several examples illustrating our method are given. Finally we show the application of our topological method in the study of the global structure of protein conformation space.

Network Scaling in Organisms, Societies and Information Systems

Date: Tuesday, March 21, 2006
Time: 11:00 am — 12:15 pm
Place: Woodward 149

Melanie Moses (UNM Faculty Candidate)
Department of Biology, University of New Mexico

The geometry of metabolic networks constrains the rate at which organisms acquire and use energy. Natural selection has evolved efficient, hierarchical, branching networks that simultaneously maximize metabolic rate and minimize internal transport distances, for example from the aorta to the capillaries in the mammalian circulatory system. The geometric properties of such networks cause a number of biological rates and times to be proportional to organism mass raised to a ¼ power.

I show that the principles underlying metabolic theory also apply to non-biological networks that distribute energy, materials and information. In particular, I show that the efficiency of energy acquisition in ant colonies and human societies scales with the number of individuals in the society. Some aspects of energy acquisition in societies are characterized by ¼ power scaling. The number of individuals in a society also affects the rate at which information is acquired and distributed, and I examine how information may be used to increase the efficiency of energy acquisition.

I also investigate how properties of information networks scale as a function of their size. I compare the scaling properties of neural networks, integrated circuits and computer networks to metabolic networks. This work suggests that the efficient designs that have evolved via natural selection may be used to design efficient information networks.

Exploiting Syntactic, Semantic and Lexical Regularities in Language Modeling

Date: Thursday, March 9th, 2006
Time: 11:00 am — 12:15 pm
Place: Woodward 149

Shaojun Wang (UNM Faculty Candidate)
Alberta Ingenuity Center for Machine Learning
Department of Computer Science
University of Alberta

Language modeling — accurately calculating the probability of naturally occurring word sequences in human natural language — lies at the heart of some of the most exciting developments in computer science, such as speech recognition, machine translation, information retrieval and bioinformatics. In this talk, I present two approaches for statistical language modeling which simultaneously incorporate various aspects of natural language, such as local word interaction, syntactic structure and semantic document information.

The first approach is based on a new machine learning technique we have proposed---the latent maximum entropy principle---which allows relationships over hidden features to be effectively captured in a unified model. Our work extends previous research on maximum entropy methods for language modeling, which only allow observed features to be modeled. The ability to conveniently incorporate hidden variables allows us to extend the expressiveness of language models while alleviating the necessity of pre-processing the data to obtain explicitly observed features. We then use these techniques to combine two standard forms of language models: local lexical models (trigram models) and global document-level semantic models (probabilistic latent semantic analysis, PLSA).

The second approach is aimed at encoding syntactic structure into semantic trigram language model with tractable parameter estimation algorithm. We propose a directed Markov random field (MRF) model that combines trigram models, PCFGs and PLSA. The added context-sensitiveness due to trigrams and PLSAs and violation of tree structure in the topology of the underlying random field model make the inference and estimation problems plausibly intractable, however the analysis of the behavior of the composite directed MRF model leads to a generalized inside-outside algorithm and thus to rigorous exact EM type re-estimation of the model parameters.

Our experimental results on the Wall Street Journal corpus show that both approaches induce significant reductions in perplexity over current state-of-art technique.

A Formalization of the Use of Bounds with Applications in Biology and Engineering

Date: Tuesday, March 7th, 2006
Time: 11:00 am — 12:15 pm
Place: Woodward 149

Sharlee Climer (UNM Faculty Candidate)
Doctoral Candidate
Dept. of Computer Science and Engineering
Dept. of Biology
Washington University in St. Louis

In this presentation, we reflect on the history of the use of bounds and observe that previous research has focused on bounds derived from relaxations of constraints. However, bounds can be derived by tightening constraints, adding or deleting variables, or modifying the objective function. A formalization of the use of bounds as a two-step procedure, called limit crossing, is presented.

We used limit crossing to produce an original search strategy called cut-and-solve. We compared a simple, generic implementation of cut-and-solve with state-of-the-art solvers for seven real-world Traveling Salesman Problem classes. Cut-and-solve was faster when solving large instances for five of the problem classes and able to solve larger (sometimes substantially larger) instances for four classes.

We are currently using limit crossing for the biological problem of inferring haplotypes. A haplotype is a set of nucleotide sites gathered from a stretch of DNA. Genetic association studies use haplotypes to identify correlations between diseases and genes. We are implementing a limit-crossing tool for haplotype inferencing based on characteristics that can be expected from human populations.

Biography:

Sharlee Climer is a doctoral candidate at Washington University in St. Louis. Her advisors are Weixiong Zhang, in the department of Computer Science, and Alan Templeton, in the department of Biology. She holds degrees in Physics (BA), Civil/Structural Engineering (BS), and Computer Science (BS and MS). She is a computational scientist with a focus on combinatorial optimization problems. Sharlee has presented tutorials on her dissertation work at both the 19th International Joint Conference on Artificial Intelligence (IJCAI'05) and the 20th National Conference on Artificial Intelligence (AAAI'05). She has served on program committees for the 20th National Conference on Artificial Intelligence (AAAI'05) and the 2002, 2005, and 2006 Olin Conferences. Sharlee is the recipient of a National Defense Science and Engineering Graduate (NDSEG) Fellowship, an Olin Fellowship, and more than a dozen scholarships.

Dynamic Differential Data Protection

Date: Tuesday, February 20, 2006
Time: 11:00 am — 12:15 pm
Place: Woodward 149

Dr. Patrick Widener
Department of Computer Science, UNM

Modern distributed applications are long-lived, are expected to provide flexible and adaptive data services, and must meet the functionality and scalability challenges posed by dynamically changing user communities in heterogeneous execution environments. The practical implications of these requirements are that reconfiguration and upgrades are increasingly necessary, but opportunities to perform such tasks offline are greatly reduced. Developers are responding to this situation by dynamically extending or adjusting application functionality and by tuning application performance, a typical method being the incorporation of client- or context-specific code into applications' execution loops.

Prior work has highlighted the performance advantages provided by dynamic code extension or specialization. Our work addresses a basic roadblock in deploying such solutions, which is the protection of key application components and sensitive data in distributed applications. Dynamic Differential Data Protection (D3P) provides fine-grain methods for providing component-based protection in distributed applications. Context-sensitive, application-specific security methods are deployed at runtime to enforce restrictions in data access and manipulation. D3P is suitable for use in low- or zero-downtime environments, since such deployments are performed while applications run, D3P is appropriate for high performance environments and for highly scalable applications like publish/subscribe, because it creates native codes via dynamic binary code generation. Finally, due to its integration into middleware, D3P can run across a wide variety of operating system and machine platforms.

This talk introduces the need for D3P, using sample applications from the high performance and pervasive computing domains to illustrate the problems addressed by our D3P solution. It also describes how D3P can be integrated into modern middleware. Experimental evaluations demonstrate the fine-grain nature of D3P, that is, its ability to capture individual end users' or components' needs for data protection, and they also describe the performance implications of using D3P in data-intensive applications.

Discrete Bayesian Network Structure Search with an Application to Functional Magnetic Resonance Imaging Data

Date: Tuesday, February 20, 2006
Time: 11:00 am — 12:15 pm
Place: Woodward 149

John Burge
Department of Computer Science, UNM

Neuroimaging technologies, such as functional magnetic resonance imaging (fMRI), have become widely used in the analysis of mental illness. They are non-invasive techniques used to create images that measure (possibly indirectly) human neural activity. Measuring something as complex and highly detailed as the human brain poses significant data mining challenges. For example, an fMRI scan for a single patient can posses extremely high dimensionality, easily resulting in more than 250 megabytes of data. Neuroscientists and statisticians have a wide range of methods for analyzing such data, however, many of these methods make linear assumptions that are not likely true given the dynamics of the human brain. We introduce a nonlinear method for modeling the data based on discrete Bayesian networks, a modeling framework commonly employed within the machine learning community. We provide a synopsis for this framework and discuss the extensions we have proposed for learning generative and class-discriminative models as well as for improving model selection based on the hierarchical arrangement of neuroanatomical regions of interest. We briefly discuss our experimental results and methods used to validate them.

A Short-Short Overview of Machine Learning at UNM

Date: Thursday, February 9, 2006
Time: 11:00 am — 12:15 pm.
Place: Woodward 149

Prof. Terran Lane
Department of Computer Science, UNM

Many of you are, by now, aware of the machine learning course being offered in our department, and you may even be vaguely aware of some of the research that's happening. But not all of you know much about either of the above. In this talk, I'll give a brief overview of what machine learning is all about (and why you should care). I'll also give some overviews of the work that my research group is doing. But most of the time will be spent with two of your colleagues from my research group discussing their research projects: Bayesian modeling for neuroscience and spectral methods for learning about social structures

Investigating the well-mixed assumption in viral infection dynamics

Date: Tuesday, January 30, 2006
Time: 11:00 am — 12:15 pm.
Place: Woodward 149

Catherine Beauchemin
Department of Computer Science, UNM

In the past, viral kinetics has been typically studied through the use of spatially well-mixed ordinary differential equations which describe the number of cells of each type as a function of time only, assuming that those cells are uniformly distributed in space. In the real system, however, spatial heterogeneities such as localized pockets of dead cells can emerge over the course of the infection which can affect the spread of infection not unlike a counter-fire which can stop a forest fire from spreading.

In order to investigate the possible role of spatial heterogeneities on the development and outcome of a viral infection, I propose a simple agent-based model (ABM). I will show that the model is complex enough to reproduce the general shape of a response to an uncomplicated viral infection, and that it gives quantitatively reasonable results when calibrated for the particular case of influenza A. I will then use the ABM to investigate the effects of relaxing the well-mixed assumption. Particularly, the effects of the initial distribution of infected cells, the regeneration rule for dead epithelial cells, and the proliferation rule for immune cells are explored and shown to have an important impact on the development and outcome of the viral infection in our model.

Finally, if time permits, I will also present results from an analysis of published T cell track data captured by two-photon microscopy experiments in vivo.

Suggested reading: http://www.cs.unm.edu/~cbeau/docs/spatial.pdf

A Sublinear-Time Randomized Approximation Scheme for the Robinson-Foulds Metric

Date: Tuesday, January 24, 2006
Time: 11:00 am — 12:15 pm.
Place: Woodward 149

Nick Pattengale
Department of Computer Science, UNM and Sandia National Labs

The Robinson-Foulds (RF) metric is the measure most widely used in comparing phylogenetic trees; it can be computed in linear time using Day's algorithm. When faced with the need to compare large numbers of large trees, however, even linear time becomes prohibitive. We present a randomized approximation scheme that provides, with high probability, a (1+epsilon) approximation of the true RF metric for all pairs of trees in a given collection. Our approach is to use a sublinear-space embedding of the trees, combined with an application of the Johnson-Lindenstrauss lemma to approximate vector norms very rapidly. We discuss the consequences of various parameter choices (in the embedding and in the approximation requirements). We also implemented our algorithm as a Java class that can easily be combined with popular packages such as Mesquite; in consequence, we present experimental results illustrating the precision and running-time tradeoffs as well as demonstrating the speed of our approach.