Date: November 10, 2009
Time: 9:30AM
Place: FEC 141
Sean M. Brennan
Committee Chair:
Barney Maccabe, Director of the Computer Science and Mathematics Division at Oak Ridge National Laboratory
Committee Members:
Wenbo He, Assistant Professor of Computer Science at UNM
Sudharman Jayaweera, Assistant Professor of Electrical and Computer Engineering at UNM
Michael Cai, Los Alamos National Laboratory
Abstract:
The SENSIX software framework uniquely integrates constraint-dominated
wireless sensor networks with the flexibility of object-oriented
programming models, without violating the principles of either. Though
these two computing paradigms are opposite in many ways, SENSIX bridges
them to yield a dynamic middleware abstraction unifying low-level
resource-aware task reconfiguration and high-level object recomposition.
Through the layered approach of SENSIX, the software developer creates a
domain-specific sensing architecture by merely defining a customized
task specification and utilizing object inheritance. In addition, SENSIX
performs better at large scales (on the order of 1000 nodes or more)
than other sensor network middleware which do not include such unified
facilities for vertical integration.
Date: November 5, 2009
Time: 3:30PM
Place: ME 427
Guanyu Wang
Committee Chair:
Joe Kniss, Assistant Professor of Computer Science at UNM
Committee Members:
Pradeep Sen, Assistant Professor of Electrical and Computer Engineering at UNM
Lance Williams, Associate Professor of Computer Science at UNM
Abstract:
I will propose a simple and robust method for image and volume data segmentation based on manifold distance metrics. In this approach, pixels in an image are not considered as points with color values arranged in a grid. In this way, a new data set is built by a transform function from one traditional 2D image or 3D volume to a manifold in higher dimension feature space. Multiple possible feature spaces like position, gradient and probabilistic measures are studied and experimented. Graph algorithm and probabilistic classification are involved. Both time and space complexity of this algorithm is O(N). With appropriate choice of feature vector, this method could produce similar qualitative and quantitative results to other algorithms like Level Sets and Random Walks. Analysis of sensitivity to parameters is presented.Comparison between segmentation results and ground-truth images is also provided to validate of the robustness of this method.
Date: October 20, 2009 at 11:00am
Time: 11:00AM
Place: FEC 141
Manjunath Gorentla Venkata
Committee Chair:
Patrick Bridges, Associate Professor of Computer Science at UNM
Committee Members:
Dorian Arnold, Assistant Professor of Computer Science at UNM
Rolf Riesen, Principal Member of the Technical Staff, Scalable Systems Software, Sandia National Laboratories
Nasir Ghani, Associate Professor of Electrical and Computer Engineering at UNM
Abstract:
Modern HPC applications, for example adaptive mesh refinement and multi-physics codes, have dynamic communication characteristics which result in poor performance on current MPI implementations. The degraded application performance can be attributed to a mismatch between changing application requirements and static communication library functionality. To improve the performance of these applications, MPI libraries should change their protocol functionality in response to changing application requirements, and tailor their functionality to take advantage of hardware capabilities.
This dissertation describes PRO-MPI, a framework for constructing profile-driven reconfigurable MPI libraries; these libraries use past application characteristics (profiles) to dynamically change their functionality to match the changing application requirements. The framework addresses the challenges of designing and implementing the reconfigurable MPI libraries, which include collecting and reasoning about application characteristics to drive the protocol reconfiguration and defining abstractions required for implementing these reconfigurations. Two prototype reconfigurable MPI implementations based on the framework -- Open PRO-MPI and Cactus PRO-MPI -- are also presented to demonstrate the utility of the framework.
To demonstrate the effectiveness of reconfigurable MPI libraries, this dissertation presents experimental results to show the impact of using these libraries on the application performance. The results show that PRO-MPI improves the performance of important HPC applications and benchmarks. They also show that the application performance improves significantly when exact profiles are available, and the performance improves considerably when only approximate profiles are available.
Date: Tuesday, September 1st, 2009
Time: 10:00AM
Place: ME 400
Tamanna Arora
Committee Chair:
Melanie Moses, Assistant Professor of Computer Science at UNM
Committee Members:
George Luger, Associate Chair of Computer Science at UNM
Payman Zarkesh-Ha, Assistant Professor of Electrical and Computer Engineering at UNM
Abstract:
Convergence of computing capabilities combined with increased consumer demand has resulted in the development of devices with enhanced functionalities and capabilities. System designers face a daunting challenge of providing range of functionalities at a low cost while maintaining the reliability of existing systems. As devices become more complex, it is of prime importance to find out ways to make an optimized use of power available from batteries. This work focuses on curbing power consumption right at the design stage.
This work emphasizes minimizing active power consumption by minimizing the load capacitance of the chip. The two capacitive components are wires and vias. Wires are used for routing of components on chips. Vias provide an electrical connection between two adjacent metal layers. Multiple metal layers are very useful as they provide tiers of horizontal and vertical routing area, stacked over each other and connected by vias. Hence adding vias to the design can save wire-length whereas routing in same layer at the cost of increased wire-length can save vias. Thus there exists a trade-off between the number of wires and vias used in routing.
This work uses Ant Colony Optimization algorithm to handle the multiple constraints of minimizing wire-length and vias to achieve the goal of minimizing capacitance and hence power consumption. Ant Colony Optimization provides a multi agent framework for combinatorial optimization problems. Every agent practices an independent sequential decision process and utilizes memory, stochastic decision making and strategies of collective and distributed learning to perform optimal search. ACO can be easily adapted to follow a particular set of heuristic rules that are formulated according to the specific goals of optimization problem, in this case, finding routes that minimize the capacitance of the chip.
This work employs ACO on both manhattan and diagonal routing architectures and uses a set of heuristics to find optimal routing paths on the chip.
Date: Thursday, August 27th, 2009
Time: 8:00AM
Place: ME 427
Sushmita Roy
Committee Chair:
Terran Lane, Associate Professor of Computer Science at UNM
Committee Co-Chair:
Margaret Werner-Washburne, Professor of Biology at UNM
Committee Members:
Melanie Moses, Assistant Professor of Computer Science UNM
Susan Atlas, Research Associate Professor of Physics at UNM
Abstract:
Condition-specific cellular networks are networks of genes and proteins that describe functional interactions among genes occurring under different environmental conditions. These networks provide a systems-level view of how the parts-list (genes and proteins) interact within the cell as it functions under changing environmental conditions and can provide insight into mechanisms of stress response, cellular differentiation and disease susceptibility. The principle challenge, however, is that cellular networks remain unknown for most conditions and must be inferred from activity levels of genes (mRNA levels) under different conditions. This dissertation aims to develop computational approaches for inferring, analyzing and validating cellular networks of genes from expression data.
This dissertation first describes an unsupervised machine learning framework for inferring cellular networks using expression data from a single condition. Here cellular networks are represented as undirected probabilistic graphical models and are learned using a novel,data-driven algorithm. Then several approaches are described that can learn networks using data from multiple conditions. These approaches apply to cases where the condition may or may not be known and, therefore, must be inferred as part of the learning problem. For the latter, the condition variable is allowed to influence expression of genes at different levels of granularity: condition variable per gene to a single condition variable for all genes.
Results on simulated data suggest that the algorithm performance depends greatly on the size and number of connected components of the union network of all conditions. These algorithms are also applied to microarray data from two yeast populations, quiescent and non-quiescent,isolated from glucose starved cultures. Our results suggest that by sharing information across multiple conditions, better networks can be learned for both conditions, with many more biologically meaningful dependencies, than if networks were learned for these conditions independently. In particular, processes that were shared among both cell populations were involved in response to glucose starvation, whereas the processes specific to individual populations captured characteristics unique to each population. These algorithms were also applied for learning networks across multiple species. Preliminary analysis suggests that our approaches capture shared characteristics across different species better than approaches that learn networks for each species independently.
Finally, this dissertation focuses on validation of cellular networks. This validation framework describes scores for measuring how well network learning algorithms capture higher-order dependencies. This framework also introduces a measure for evaluating the entire inferred network structure based on the extent to which similarly functioning genes are close together on the network.
Date: Thursday, August 27th, 2009
Time: 9:00AM
Place: FEC 141
Stephan Falke
Committee Chair:
Deepak Kapur, Distinguished Professor of Computer Science at UNM
Committee Members:
Juergen Giesl, Professor of Computer Science at RWTH Aachen University,Germany
William McCune, Research Professor of Computer Science at UNM
Robert Veroff, Professor Emeritus of Computer Science at UNM
Abstract:
Term rewrite systems have been extensively used in order to model computer programs for the purpose of formal verification. This is in particular true if the termination behavior of computer programs is investigated, and automatic termination proving for term rewrite systems has received increased interest in recent years. Ordinary term rewrite systems, however, exhibit serious drawbacks. First, they do not provide a tight integration of natural numbers or integers. Since the pre-defined semantics of these primitive data types cannot be utilized, reasoning about termination of ordinary term rewrite systems operating on numbers is often cumbersome or even impossible. Second, ordinary term rewrite system cannot accurately model collection data structures such as sets
or multisets, which are supported by many high-level programming languages such as Maude or OCaml.
This dissertation introduces a new class of term rewrite systems that addresses both of these drawbacks and thus makes it possible to accurately model computer programs using a high level of abstraction in a natural formalism. Then, the problem of automatically proving termination for this new class of term rewrite systems is investigated. The resulting dependency pair framework provides a flexible and modular method for proving termination. In addition to unrestricted rewriting, termination of rewriting with the innermost strategy or a context-sensitive rewriting strategy is investigated as well.
The techniques for proving termination that are developed in this dissertation have been implemented in the well-known termination prover AProVE. An empirical evaluation shows that the implementation succeeds in automatically proving termination of a large collection of computer programs that are modeled using the new class of term rewrite systems developed in this work.
Next, the use of this new class of term rewrite systems in the context of inductive theorem proving is investigated. This makes it possible to reason about the semantics of computer programs. The inductive theorem proving method developed in this dissertation provides a tight integration of inductive reasoning with a decision procedure, thus resulting in a high degree of automation.
Finally, conditions under which the inductive theorem proving method is guaranteed to succeed in proving or disproving a conjecture without any user intervention are identified. Thus, the inductive theorem proving method can be applied as a "black box" if these conditions are satisfied.
The inductive theorem proving method and checks for the conditions under which it provides a decision procedure have been implemented in the prototype prover Sail2. An empirical evaluation shows that Sail2 is very efficient, and the high degree of automation makes it possible to use Sail2 in a push-button mode for formal program verification.
Date: Tuesday, March 31st, 2009
Time: 10:00AM
Place: FEC 141
Josh Karlin
Committee Chair:
Stephanie Forrest, Professor and Chair of Computer Science at UNM
Committee Members:
Jennifer Rexford, Professor of Computer Science at Princeton University
Barney Maccabe, Director of the Computer Science and Mathematics
Division at Oak Ridge National Laboratory
Jedidiah Crandall, Assistant Professor of Computer Science at UNM
The Internet has developed into an important economic, military, academic, and social resource. It is a complex network, comprised of tens of thousands of independently operated networks, called Autonomous Systems (ASes). A significant strength of the Internet's design, one which enables its rapid growth in terms of users and bandwidth, is that its underlying protocols (such as IP, TCP, and BGP) are distributed. Users and networks alike can attach and detach from the Internet at will, without causing major disruptions to global Internet connectivity.
This dissertation shows that the Internet's distributed, and often redundant structure, can be exploited to increase the security of its protocols, particularly BGP (the Internet's interdomain routing protocol). It also presents statistical measurements of the Internet's structure and uses them to create a model of Internet growth. This work could be used, for instance, to test upcoming routing protocols on large, Internet-like graphs. Finally, this dissertation shows that while the Internet is designed to be agnostic to political influence, the Internet is actually quite centralized at the country level. With the recent rise in country-level Internet policies, such as nation-wide censorship and warantless wiretaps, this centralized control could have significant impact on international reachability.
Date: Mon, March 30th, 9:00am
Time: 9:00AM
Place: FEC 141
Edgar A. Leon
Committee Chair:
Arthur B. Maccabe, Chair
Committee Members:
Rolf Riesen, Principal Member Technical Staff, Sandia National
Laboratories
Dilma Da Silva, Research Staff Member, IBM T. J. Watson Research Center
Patricia J. Teller, Professor, University of Texas at El Paso
Patrick G. Bridges, Assistant Professor, University of New Mexico
Cache injection is a viable technique to improve the performance of data-intensive parallel applications. This dissertation examines the effect of cache injection of incoming network data in the performance of parallel scientific applications. My results show that this effect is a function of the ratio of processor to memory speed, the cache injection policy, and the application's communication characteristics. Cache injection addresses the memory wall for I/O by writing data into a processor's cache directly from the I/O bus. This technique, unlike data prefetching, reduces the number of reads served by the memory unit. This reduction is significant for applications with a high degree of data intensiveness (the amount of unique data the application accesses). The performance of these applications is dominated by compulsory cache misses and cannot be alleviated by traditional caching systems.
Unlike previous work on cache injection (focused on reducing host network stack overhead incurred by memory copies), I show that applications can use injected data directly from the cache resulting in an overall improvement in their performance. I also show that the performance of cache injection is directly proportional to the ratio of processor to memory speed. In other words, systems with a memory wall can provide significantly better performance with cache injection and an appropriate injection policy. This result also suggests that multi-core architectures may greatly benefit from this technique based on the high ratio of aggregated computational power to memory speed. Finally, my results show that the application's communication characteristics are key to the performance of cache injection. For example, the performance of certain collective communication operations can improve their performance up to 20% as a function of message size.
Date: Wednesday, March 25th, 2009
Time: 11:00AM
Place: FEC 141
Anthony M. Campisi
Committee Chair:
Thomas P. Caudell, Director of the Center for High Performance Computing at UNM
Committee Members:
George F. Luger, Professor and Associate Chair of Computer Science at UNM
Pradeep Sen, Assistant Professor of Electrical and Computer Engineering
Modeling human behavior in computer simulations and games is a subject which draws considerable attention. Despite the increased realism of graphics in games, poor modeling of non-player characters' AI often leads to a shallow and unfulfilling game experience. Recently there has been increased focus on more sophisticated AI routines which have been used in both academic and commercial games.
Emotion, however, is often ignored despite being an essential element of human behavior especially under stressful conditions. Research into the use of emotion in agent-based systems seems more concerned with how to convey the emotions of agents to the human player, or how to illicit an emotional response from the human player. Only recently has there been research on modeling emotions in combat simulators. This thesis will describe an emotional model suitable for most computer games which was adapted from the DETT model and significantly expanded. The DETT was developed for a combat simulator, and, while multiple emotions were modeled, an agent could only express one emotion. Our extended DETT model is implemented in a First Person Shooter game, and allows individual agents to express difference emotions.
Date: Friday, March 13th, 2009
Time: 9:00AM
Place: FEC 141
Nathan Swanson
Abstract:
Gamma Knife radiosurgey has been the gold-standard of stereostatic radiosurgery. Since inception in the 1960's, Gamma Knife radiosurgery has always been planned using a ball-packing approach and delivered in a step-and-shoot manner. We have developed a new dynamic scheme for Gamma Knife radiosurgery. Both theoretical and practical results are presented in this work.
The theoretical results provide rigorous mathematical analysis using techniques from computational geometry on improving 2D and 3D dynamic covering problems, known respectively as the Lawn Mowing Problem and the Dose Painting Problem. These problems are known to be NP-Complete and have immediate applications in dynamic radiosurgery, but they also have other applications as well, including surveillance, robotic routing, and search-and-rescue. This dissertation presents improved approximation algorithms to these problems. The key approach is the application of "grid-shifting" and "grid-rotation". The practical results include an investigation of the advantages of using dynamic radiosurgery over current static step-and-shoot treatment techniques. These advantages include substantial economic, efficiency, and clinical treatment effectiveness improvements. Furthermore, a fully automatic treatment planning algorithm to produce high-quality dynamic radiosurgery plans for Gamma Knife treatment is developed and discussed in detail. Finally, the experimental results of Dynamic Gamma Knife radiosurgery on three phantom trials are presented. The results show that dynamic Gamma Knife radiosurgery has great potential for improving stereostatic radiosurgery. The clinical application of the technique should considerably reduce labor intensiveness of treatment planning, shorten delivery times, and significantly improve treatment quality with more conformal dose distributions while maintain patient safety.
Committee Chair:
Shuang Luan, Assistant Professor of Computer Science
at UNM
Committee Members:
Phil Heintz - Professor of Radiology and Professor
of Physics and Astronomy at UNM
Dr. Lijun Ma - Associate Professor in
Residence of Radiation Oncology at The University of California, San
Franscico
Cris Moore - Professor of Computer Science at UNM
Vince
D. Calhoun - Associate Professor of Electrical and Computer Engineering
at UNM and the Director of Image Analysis and MRI Research at the MIND
Institute at UNM
Date: Monday, November 10th, 2008
Time: 8:30AM
Place: FEC 141
Martha O. Perez-Arriaga
This research studies brain evolution, focusing on how the developmental period in an infant may be useful to predict behavior. Brain connections formed during the nurturing period of an individual’s development are fundamental for survival. The evolution process can result in the most optimal brain architecture and a behavior with the greatest survival advantage. Brain evolution is simulated for various individuals in two similar species. The simulation yields information about the performance of the whole population over time. Category Theory is used to analyze the relationship between specific brain architectures and evolutionary advantage.
The tools of Artificial Intelligence can be applied to the above simulations. In this computer simulated brain development and evolution in the infant, FlatWorld is used to simulate a complete environment in which development occurs and survival skills are tested. A combination of Genetic Algorithms and Neural Networks are applied to FlatWorld. The two main objectives of this research are: 1) to show the relationship between the mother’s nurturing period toward the infant during the developmental period and the consequent behavior of the infant in an unknown simulated real world; 2) to show the evolution of the associated brain structure, an analogy to synaptogenesis.
The results suggest that there is a strong relationship between learning and an individual’s performance. The structure of the brain architectures is also related to the individual’s performance, leading to a classification of individuals with different abilities to learn and survive. The exposition of the infant to the nurturing period shows the synaptogenesis stage. The advantages of some species can be overcome through a learning and evolution process for the weaker species. Finally the motor skills in individuals give some advantages in surviving.
Key words: Neural Networks Architecture, Genetic Algorithms, Category Theory, Evolution.
Date: Monday, September 15th, 2008
Time: 10:00AM
Place: FEC 141
James Horey
Committee: Arthur Maccabe, chair Stephanie Forrest, Patrick Bridges, Wennie Wei Shu
Abstract:
A hierarchical group model that decouples computation from hardware can characterize and aid in the construction of sensor network software with minimal overhead. Future sensor network applications will move beyond static, homogeneous deployments to include dynamic, heterogeneous elements. These sensor networks will also gain new users, including casual users that will expect intuitive interfaces to interact with sensor networks. In order to address these challenges, a new computational model and systems implementing the model are necessary. This model must ensure that computation can be readily (re)assigned as sensor nodes are introduced or removed. Simultaneously, the model must include methods for communication that take into account these dynamic elements.
This dissertation presents a detailed description and design of a computational model that resolves these challenges using a hierarchical group mechanism. In this model, computation is tasked to logical groups and split into collective and local components that communicate hierarchically. Local computation is primarily used for data production and publishes data to the collective computation. Similarly, collective computation is primarily used for data aggregation and pushes results back to the local computation. Finally, the model also includes data-processing functions that sit between local and collective functions and are responsible for data conversion.
This dissertation also presents implementations and applications of the model. Implementations include Kensho, a C-based implementation of the hierarchical group model, that can be used for implementing a variety of user applications. Another implementation, Tables, presents a spreadsheet-inspired view of the sensor network that takes advantage of hierarchical groups for both computation and communication. Users are able to specify both local and collective functions that execute on the sensor network via the spreadsheet interface. Besides these implementations, applications of the model are also explored. One such application, FUSN, provides a set of methods for constructing filesystem-based interfaces for sensor networks. This demonstrates the general applicability of our model as applied to sensor network programming and management interfaces. Finally, we apply our model to novel privacy algorithms to demonstrate that our model isn't strictly limited to programming interfaces.
Date: Thursday, August 29th, 2008
Time: 8:00AM
Place: FEC 141
Jorge A. Navas
Committee : George Luger, Manuel Hermenegildo, Deepak Kapur, Stephanie Forrest
Static analysis is a powerful technique used traditionally for optimizing programs, and, more recently, in tools to aid the software development process, and in particular, in finding bugs and security vulnerabilities. More concretely, the importance of static analyses that can infer information about the costs of computations is well recognized since such information is very useful in a large number of applications.
Furthermore, the increasing relevance of analysis applications such as static debugging, resource bounds certification of mobile code, and granularity control in parallel computing makes it interesting to develop analyses for resource notions that are actually application-dependent. This may include, for example, bytes sent or received by an application, number of files left open, number of SMSs sent or received, energy consumption, number of accesses to a database, etc.
In this thesis, we present a resource usage analysis that aims at inferring upper and lower bounds on the cost of programs as a function of its data size for a given set of user-definable resources of interest. We use logic programming as our basic paradigm since it subsumes many others and this allows us treating the problem at a considerable level of generality.
Resource usage analysis requires various pre-analysis steps. An important one is the Set-Sharing analysis which attempts to detect mode information and which variables do not point to the same memory location, providing essential information to the resource usage analysis. Hence, this thesis also investigates the problem of efficient Set-Sharing analyses presenting two different alternatives: (1) via widening operators, and (2) defining compact and effective encodings.
Moving to the area of applications, a very interesting class involves certification of the resources used by mobile code. In this context, Java bytecode is widely used, mainly due to its security features and the fact that it is platform independent. Therefore, this thesis finally presents a resource usage analysis tool for Java bytecode that includes also a transformation which provides a block-level view of the bytecode, and can be used as a basis for developing analyses. We have also developed for this purpose, a generic, abstract interpretation-based fixpoint algorithm which is parametric in the abstract domain. By plugging appropriate abstract domains into it, the framework provides useful information that can improve the accuracy of the resource usage information.
Date: Wednesday, August 28th, 2008
Time: 3:30 pm
Place: FEC 141
Mark Marron
Committee: Deepak Kapur, Co-Chair; Dark Stefanovic, Co- Chair; Manuel Hermenegildo; Rupak Majumdar
The ability to accurately model the evolution of the program heap during the execution of a program is critical for many types of program optimization and verification techniques. Past work on the topic of heap analysis has identified a number of important heap properties (connectivity, shape, heap based dependence information, and region identification) that are used in many client applications (parallelization, scheduling, memory management) and has explored a range of approaches for computing this information. However, past work has been unable to compute this information with the required level of accuracy and/or has not been computationally efficient enough to make the analysis practical. The inability to compute the required heap information has limited the scope or effectiveness of many branches of research (the inability to compute the required heap information made the optimization technique impractical or severely limited its effectiveness).
A general--purpose heap analysis technique is proposed. The major objective in the construction of this is to provide the range of heap information needed by the optimization applications stated above, on real world programs, in a computationally tractable manner. Keeping tractability in mind, a set of properties/information required by client optimization applications, e.g., automatic parallelization, classical scheduling, redundancy elimination transformation, stack allocation, pool allocation, object co--location or static garbage collection, is proposed. A set of design heuristics are selected to ensure that the analysis is fast and precise in the common case, and that ensure good computational performance by reducing precision in less common situations.
This general approach allows the construction of an analysis that satisfies a number of key requirements for producing a practical heap analysis: The ability to handle most of the Java 1.4 language including major language features; arrays, virtual calls, interfaces, exceptions, etc., as well as the standard Java libraries. No restrictions are placed on the heap structures that are constructed; no restrictions on the recursive data structures built or how these structures are connected/shared. The analysis is able to provide precise information needed for a wide range of optimization applications; aliasing, connectivity, shape, identification of logical data structures and heap carried data dependence information. Finally, that in practice this information can be computed efficiently.
The technique has been implemented and used on a range of small to medium size benchmarks from the JOlden and SPECjvm suites as well as a number of benchmarks that were developed during this work. The analysis technique is evaluated using a number of detailed case studies which focus on the heap structures that the analysis is able to identify in the program and how this information can be used in a variety of optimization techniques (focusing on vectorization/loop scheduling, thread-level parallelization and memory management). Most of these benchmarks are naturally amenable to thread--level parallelization, thus as a more empirical evaluation of the heap analysis results a detailed evaluation of the analysis is done using the results to drive the parallelization of the benchmark programs, resulting in a substantial speedup over the benchmark suites.
The runtime costs of the analysis are evaluated both in terms of the total analysis time and the general scalability of the analysis. The results of this evaluation indicate that the technique presented in this thesis is indeed a general purpose solution to the heap analysis problem (at least for small/medium size programs) that can be used to efficiently compute precise and useful information for program optimization over a wide range of real world Java programs.
Date: Wednesday, August 28th, 2008
Time: 9:00 am
Place: FEC 141
Mario Mendez
Committee: Manuel Hermenegildo, Deepak Kapur, Darko Stefanovic, German Puebla, George Luger
Analysis of the Java language (either in its source version or its compiled bytecode) using the framework of abstract interpretation has been the subject of significant research in the last decade. Most of this research concentrates on finding new abstract domains that better approximate a particular concrete property of the program analyzed in order to optimize compilation or statically verify certain properties about the run-time behavior of the code. In contrast to this concentration and progress on the development of new, refined domains, there has been comparatively little work in the development of extensible, generic frameworks that sustain the analysis. The first component in such a generic framework is a standard representation of the program that facilitates later analyses or optimizations. Although many times the description of that Control Flow Graph is omitted, we show that a uniform, compact representation is fundamental in order to manipulate similar constructions of the language in a uniform way. The Horn clause-based representation chosen is general enough to represent not only object-oriented programs, but also logic programming applications.
In the context of the abstract interpretation framework, the fixpoint algorithm that lies at its very core has been demonstrated to have dramatic impact in the efficiency and precision of any analysis. A particularly optimal combination of the two attributes can be achieved by a flow-sensitive, context-sensitive fixpoint algorithm. We present a detailed description of such an algorithm, which contemplates mutually recursive nodes in the Control Flow Graph and uses memoization for further efficiency.
Generic abstract interpretation frameworks work in conjunction with abstract domains. Each domain captures a particular property of the program. A very interesting characteristic to analyze is whether a set of variables share, i.e., they might reach the same memory location. The information gathered by a sharing domain is used for parallelization and/or for optimizing the compilation. What we present is a combination of domains (sharing, nullity and types) which can work together to refine their results, i.e, be more precise. The approach is shown to achieve better results than previous sharing analyses.
The combinatorial nature of the set sharing domain has been the subject of intense debate within the analysis community. The exponential complexity of some of the operations (e.g., abstract unification in Prolog or field load/store in Java) seemed to be a big obstacle that prevented the domain from being widely used. In this thesis, we present the first scalable implementation of set sharing, which is based in a special type of Binary Decision Diagrams, called Zero-supressed Binary Decision Diagrams (ZBDDs). By representing sets of sets with this data structure, we not only improve the memory requirements of the analysis but also achieve better efficiency in terms of the overall running time.
Date: Monday, August 25th, 2008
Time: 9:00 am
Place: FEC 141
Eric Trias
Committee: Stephanie Forrest (chair), Greg Heileman, Terran Lane, Jedidiah Crandall
In this dissertation, I advance the concept known as negative databases, or negative representation of information, by developing a negative relational algebra, a negative database management system architecture, and by exploring how negative information can improve finding solutions to combinatorial problems, such as set-sharing analysis. This research strives to leverage both positive data and its complement, or negative data, to provide a secure and/or compact representation as required by an application.
Date: Wednesday, May 14th, 2008
Time: 10:00 am
Place: FEC 141
Kurt Ferreira
Operating system interference or noise has been shown to be a key limiter of application scalability in high-end systems. While several studies have attempted to quantify the sources and effects of system interference using user-level mechanisms, there are few published studies on the effect of different kinds of kernel-generated OS noise on application performance at scale. In this thesis, we describe a kernel-level noise injection architecture and examine the sensitivity of real-world, large-scale applications to a range of OS noise patterns. This noise injection framework is built into the Catamount lightweight operating system that runs on the 10,000 nodes Cray Red Storm machine located at Sandia National Labs. Using this framework, we are able to demonstrate the importance of how how noise is generated, in terms of frequency and duration, and how this impact changes with application scale. For example, on Red Storm we are able to show that 2.5% net processor noise at 10,000 nodes can have no effect or can result in over a factor 20 slowdown for the same application, depending solely on how the noise is generated. In addition, this thesis discusses how the characteristics of the applications we study, for example computation/communication ratios, collective communication sizes, and other characteristics, relate to their tendency to amplify or absorb OS noise. The results of this work are important as they are key to understanding the trade-offs involved in the design and implementation of operating systems, middleware, and other system services of high-end parallel systems.
Thesis Committee: Patrick Bridges (advisor), Arthur (Barney) Maccabe, Rolf Riesen
Date: Wednesday, May 7th, 2008
Time: 12:30 pm
Place: FEC 145
Nathan D Fabian
Abstract:
I present a technique for creating video game agents using machine learning to clone the behavior of human trainers playing the game. I compare the results of C4.5, a traditional attribute-value learning algorithm against first-order, relational learning algorithms: FOIL, PROGOL, and PROXIMITY. I establish a synthetic framework in which to analyze various attributes of behavior learning, and I will argue that although FOIL is more fragile as a result of being a greedy search, it is the most pragmatic option of the three. Finally I show a comparison of results of C4.5 and FOIL on human trainers.
Thesis Committee: Terran Lane, George Luger, and Cris Moore.
Date: Tuesday, April 15, 2008
Time: 11 am
Place: FEC 141
Nikita A. Sakhanenko
Abstract:
Modeling real-time dynamic environments, whose information is recorded using distributed sensors, can require reestimation of current models and assumptions. Model reorganization is needed when the nature of the situation changes. Such context changes in the data can arise from sensor failure, when objects in the sensor field come and go, or when modalities of causal interaction change. Additionally, the dynamic nature of environments requires fast inference across interpretive models. Model pruning is needed to reduce the model size by eliminating redundant and irrelevant information. A general, context-sensitive probabilistic modeling problem is defined in this thesis.
This thesis describes a flexible, multi-layered architecture for context-sensitive stochastic modeling (COSMOS). At the core of this architecture is a form of probabilistic logic defined by sets of recursive relationships. The system consists of an ensemble of related probabilistic models paired with a mechanism for switching between context-activated models. Additionally, a causal reasoning component directs context activation. This architecture employs the failure-driven learning and repair procedure (similar to developmental learning in humans), which is based on a combination of abductive inference and expectation maximization (EM) parameter learning. Over time, these mechanisms incrementally populate the ensemble by generating new models when current models no longer perform acceptably. This thesis describes an interface representation of COSMOS characterizing contextual parameters of models and details the connection of interfaces with the rest of the system.
Thesis Committee: George Luger, Carl Stern, Tom Caudell and Lance Williams
Date: Monday, April 14, 2008
Time: 3:30 pm
Place: ECE 118
Tiffany A. S. Pierce
Abstract:
The analysis of large-scale structural patterns on networks can be a difficult task, especially when they consist of a large number of vertices. Often it is useful to classify the network vertices into a set of groups and analyze how the groups interact with each other. This work proposes a statistical way to infer group membership of vertices in a given graph based on the general graph structure. We will use an iterative approach that combines maximum likelihood and Markov Chain Monte Carlo (MCMC) methods to identify likely node groupings. One strength of this method is that it can be used to identify many kinds of structure, including node clusters and disassortative groups, without knowing a priori what large-scale patterns to look for. We investigate the performance of the algorithm on synthetic and real-world data sets to demonstrate its flexibility and robustness in the face of noisy data sets.
Thesis Committee:
Cristopher Moore, Terran Lane, and Melanie Moses
Date: Friday, April 11th, 2008
Time: 1:30 p.m.
Place: FEC 141
Mark Waligora
Abstract:
This work presents the development and implementation of the Virtual Window Framework with which a user or users may view, through one or more displays each, the contents of a virtual environment which is overlaid on a physical environment. This augmented reality system tracks the positions of the users' eyes as well as the positions of the displays in order to render images as if a user is looking through a display like a window into a virtual world. The physically based input to the system is hoped to provide the user with a very intuitive way to view and interact with rendered images. This sort of positional based rendering isn't necessarily a new concept, but the manner in which it is implemented here is unique in its support of an extremely wide variety of displays as well as multiple users. In designing and implementing a real-time rendering system of this type, there are many system design considerations that must be taken into account. The implementation described here focuses its considerations on accuracy, performance, usability of hardware, and portability. This work demonstrates the Virtual Window Framework in action with several sample applications. These sample applications, however, only touch on the wide variety of applications that this framework could be used with.
Date: Tuesday, March 11th, 2008
Time: 3:30 pm
Place: FEC 141
Todd Kaplan
Abstract:
In many ways, a stock market resembles an ecosystem. An ecosystem is defined by the interaction of its organisms and the environment. In financial markets, the organisms are traders and the environment is defined by the state of the market through which traders interact. Feedback loops exist between organisms and their environment. Traders impact their environment through trades. Changes in the market environment affect future decisions of traders. Ecosystems, as well as financial markets, display many of the traits associated with a complex adaptive system: aggregation, adaptation, feedback loops, and nonlinearity.
A trophic network, commonly referred to as a food web, describes the feeding relationships between different groups of species in an ecosystem. Ecologists construct trophic networks to aid their understanding of ecosystems. In the realm of financial markets, trophic networks could serve an analogous role. They could potentially illuminate the underlying dynamics responsible for commonly observed macro-level phenomena. For example, one might hypothesize that periods of market volatility occur after a keystone trader species becomes inactive and the trophic network subsequently restructures itself. My research investigates the following question: is it possible to detect trophic structure within a real-world financial market?
Towards this goal, I introduce three new tools contributing to the detection of community structure in complex networks. First, I introduce a two-phase macro-strategy for community detection that provides high-yield, robust results. Second, a fine-granularity measure of community structure is developed. Finally, I describe a dual-assortative measure (DAMM) of community structure. DAMM extends the domain of networks that can be analyzed for community structure to include those with negatively weighted edges.
After introducing the community detection tools, I shift my focus to the evolution of software agents that compete in an artificial financial market. The evolutionary framework is based on a stack-based language (Staq) that I developed for genetic programming (GP). I will examine the genetic program of an agent that consistently evolves. This evolutionary agent reveals a shortcoming in the simulated market: a lack of fundamentalism. To remedy this limitation, two value-based strategies are introduced.
I close the talk by introducing an algorithm for detecting trophic species in financial market data. The approach utilizes the community detection tools introduced earlier. Further, I describe a methodology for assessing the significance of detected structure. The efficacy of the trophic detection algorithm is demonstrated using simulated data. Finally, real-world data from the London Stock Exchange is examined.
Bio:
Todd Kaplan is a Ph.D. candidate in Computer Science at the University of New Mexico. He received his undergraduate degree in Biology at Brown University and holds graduate degrees in Computer Science (Cambridge University, UK) and Applied Mathematics (Oxford University, UK). He is a member of the Adaptive Computation group at UNM, led by Professor Stephanie Forrest. In this talk, Todd hopes to dispel rumors that he is a tenured graduate student.
Date: Monday, December 10th, 2007
Time: 11 am
Place: FEC 141
Vamsi Potluru
Committee: Shuang Luan (CS); Vince D Calhoun (EE); Paul Heintz (Radiology)
Abstract:
Non-negative Matrix factorization (NMF) has increasingly been used as a tool in signal processing in the last couple of years. NMF, like independent component analysis (ICA) is useful for decomposing high dimensional data sets into a lower dimensional space. Here, we use NMF to learn the features of both structural and functional magnetic resonance imaging (sMRI/fMRI) images. NMF can be applied to do group analysis of imaging data and we apply it to learn the spatio-temporal activations across subjects doing a common task. We add an additional contrast term to NMF (called co-NMF) to identify features distinctive between two groups. We apply our approach to a dataset consisting of schizophrenia patients and healthy controls. The results from co-NMF make sense in the light of expectations and are improved compared to the NMF results. Our method is general and may prove to be a useful tool for identifying differences between multiple groups.
Date: Friday, November 9th, 2007
Time: 10 am
Place: FEC 141
Ahmad Douglas
Committee: Patrick Bridges, M.S. Thesis Advisor Barney Maccabe, UNM Computer Science Faculty Rolf Riesen, UNM Computer Science Faculty
Abstract:
In computer science research, limited hardware resources are often shared among several different projects or teams. Under such an arrangement, a robust system must exist to archive important experimental data and system configuration parameters. Without one, an organization risks the loss of valuable experimental data at minimum; a loss of credibility could even result if some of these lost data were later called into question.
Our thesis covers the development of a storage management system to promote systematic retention of critical experimental and configuration data, paying particular attention to the challenges imposed by systems research. Toward this end, we develop formal requirements for an acceptable system. We deploy and evaluate two separate candidate solutions against our requirements, testing against the backdrop of a real-world computer systems research laboratory.
The results of our work are mixed. We found the first solution, an existing open source tool adapted to our needs, to be insufficient in functionality and limited in scalability. The second solution, which was developed specifically to exploit the strengths and address the weaknesses found in the existing tool, met the requirements under certain assumptions. In general, we find that storage management has fundamental limitations which govern its scalability for our purpose.
Date: Friday, November 9th, 2007
Time: 2:30 pm
Place: FEC 141
Warren Hunt
Abstract:
Raster images have proved themselves to be a pervasive and necessary medium, however their fixed resolution and band-limited "digital" nature impose significant limitations. Vector graphics formats achieve resolution independence and can represent discontinuous image features, but at the cost of real-time performance. A desire for interactive and real-time performance, coupled with a need for scale invariant rendering, drives the search for improved 2D and 3D dataset imaging techniques.
This work leverages a topological encoding of a dataset to attain scale invariance from a rasterized representation. This technique, called IStar, is a novel application of topology that captures the semantic characteristics in a graph-like structure. This structure, along with a reparametrized and sampled representation of the segmented dataset, is packaged as a standard raster image which can then be substantially downsampled and compressed. To view or render an IStar image, the encoded image is upsampled and the topological structure is used to reconstruct the original dataset. Unlike traditional raster approaches, this representation can preserve sharp discontinuities at any level of magnification, much like scalable vector graphics. Because this technique is raster-based, it is well suited to the real-time rendering pipeline. This technique is demonstrated, along with several extensions of the rendering function made possible by the topological encoding.
Date: Monday, November 5th, 2007
Time: 10 am
Place: HPC Seminar Room
Eric Nelson
Abstract:
Wireless sensor networks are becoming an important tool for a variety of different applications including environmental monitoring, urban sensing, personal networks, and wildlife tracking. However the growth of sensor network technology is hampered by the fact that sensor networks remain difficult to program and maintain. This is due to a variety of reasons including difficulty of programming a massively distributed application, programming under severe resource constraints, and the cost of installing and maintaining multiple networks. This problem is made even more difficult when considering that most users that are interested in using sensor networks do not have an extensive programming background. Our goal is to simplify programming complete applications for sensor network domain specialists. It is our goal to create a programming environment that allows such researchers to employ their expertise when creating a sensor network application.
To accomplish this goal, we present the Table Based Language Environment for Sensors, Tables. Tables is a flexible programming environment coupled with an intuitive user interface based on the concept of spreadsheet. Spread sheets provide a method to spatially organize and manipulate the sensor network and data coming from the network. Tables is intended to be implemented by leveraging lower level software stacks with layers like Kensho, but can also work in a more limited fashion with a less advanced API. The concepts of pivot tables and functions are the basis of Tables power and ease of use. Basic and complex applications are presented that demonstrate the data centric power of Tables. Implementation details are described and evaluated with comparisons of the overhead of Tables.
Date: Tuesday, October 30th, 2007
Time: 8 am
Place: Seminar room at High Performance Computing Center at UNM
Wenbin Zhu
Abstract:
Understanding the performance of large-scale, long-running applications is difficult. Traditional trace-based performance monitoring can provide detailed performance data, however, correctly merging of traced data from all processes and interpreting the resulting traces make this technique only useful post-mortem. Statistical methods have lower overhead on monitoring, however, correlating raw performance data between processes in even more difficult than in trace-based systems, and again not appropriate for runtime execution.
This dissertation describes a new online performance monitoring approach, called Embedded Gossip (EG). EG uses an application's natural communication to propagate and gather performance data at each process. By piggybacking performance information on each outgoing message and extracting the information from each incoming message, each application process is eventually aware of performance problems on nodes with which they directly or indirectly communicate. This approach is useful for large-scale, long-running, dynamic applications, where static configuration may not be optimal, and traditional detailed performance monitoring is expensive.
To demonstrate the viability of Embedded Gossip for global system monitoring and adaptation, this dissertation describes a general architecture for implementing EG-based monitoring systems, including a new distributed agreement protocol, 3-wait, for EG-based global system adaptations. It then presents experimental results for the two monitoring systems and one global adaptation system implemented using this architecture, including global critical path profiling, global progress counting, and global scheduling of load balancing actions. The results show that Embedded Gossip can be used to implement a wide range of runtime monitoring systems. The resulting monitoring systems have low runtime overhead, and the information gathered using these systems can effectively drive global system adaptation decisions online.
Date: Thursday, October 25th, 2007
Time: 12:15 pm
Place: FEC 141
Haixia Jia
Abstract:
To understand what makes NP-complete problems so hard, I conduct my research through two approaches: construct hard problem instances and study how bad the solvers work. A simple way to construct problem instances is to choose a random truth assignment A, and then choose clauses randomly from among those satisfied by A. However, this method tends to produce easy problems, since the majority of literals point toward the "hidden"' assignment A. Previously, we proposed a problem generator that cancel, this effect by hiding both A and its complement. While the resulting formulas appear to be just as hard for DPLL algorithms as random 3-SAT formulas with no hidden assignment, they can be solved by WalkSAT in only polynomial time. Here we propose a new method to cancel the attraction to A, by choosing a clause with t > 0 literals satisfied by A with probability proportional to q^t for some q < 1. By varying q, we can generate formulas whose variables have no bias, i.e., which are equally likely to be true or false; we can even cause the formula to deceptively point away from A. We present theoretical and experimental results suggesting that these formulas are exponentially hard both for DPLL algorithms and for incomplete algorithms such as WalkSAT.
Next, we introduce a new application for constructing hard to solve SAT instances: Negative Databases (NDBs), a strange type of database that only stores records not in the original database. NDBs offer the promise of preserving the confidentiality of the individual records in a database, while permitting certain restricted queries. NDBs have many applications in the field of data security. Constructing an NDB corresponding to a DB is equivalent to constructing a SAT formula with a set of satisfiable truth assignments as its only satisfiable truth assignments. We present a new approach to construct such formulas, and we are able to show that the formula can only be satisfied by the truth assignments that are very close to one of the hidden assignments. We then demonstrate experimentally that the formulas generated by our scheme are much harder to solve than the formulas generated by the previously suggested prefix algorithm and RNDB algorithm, and indeed they are hard enough to ensure security. We hope that this new approach could create interest in constructing hard SAT formulas as a way to protect the confidentiality of databases.
Finally, to understand how bad the solvers work, we analyze the behavior of two natural variants of the Davis-Putnam-Logemann-Loveland (DPLL) algorithm for Graph 3-Coloring on sparse random graphs G(n,p=c/n). First, we calculate analytically the probability Pc(0) that these algorithms find a 3-coloring with no backtracking at all, and show that it goes to zero faster than any analytic function as c approach c* = 3.847... Then we show that even in the "easy" phase 1 < c < c*; where Pc(0) > 0, including just above the emergence of the giant component, the expected number of backtracks is exponentially large with positive probability. To our knowledge this is the first rigorous proof that the running time of a natural backtracking algorithm has a heavy tail for graph coloring. In addition, we give experimental evidence and heuristic arguments that this tail takes the form Pc(b) ~ b-1 up to an exponential cutoff.
Date: Friday, September 21st, 2007
Time: 10 am
Place: FEC 141
Rotem Bentzur
Committee: Darko Stefanovic, Patrick Bridges, Sean Luan
Abstract:
This work describes the design and implementation of the Older-First garbage collector in combination with pretenuring enhancements in the 64-bit version of the JikesRVM project. The addition of simple pretenuring enhancements to the collection algorithm, helps mitigate the collectors relative high copying cost and thus achieve better behavior and performance in long-running server applications when compared to the basic Older-First algorithm or to generational copying algorithms such as the Appel-style generational collector.
Date: Friday, June 8th, 2007
Time: 9:45 am
Place: FEC 141
M. Leigh Fanning
Committee: Shuang Luan, David Bader, Lance Williams.
Abstract:
Simulated Annealing has been widely used an effective heuristic for computationally intractable search problems. We employ the technique in the specific application of RNA Interference gene family knockdown, where both the problem, and the search technique, are extended. Potential applications include development of therapeutic gene based medicines and enhanced gene function inference.The multi-step RNA Interference (RNAi) process is functionally dependent on matching a double-stranded RNA or fragmented segment of RNA to an equivalent length along a target messenger RNA. Matching as part of a biochemical process renders messenger RNA incapable of completing its intended transcription, effectively preventing translation to amino acids and expression of the anticipated protein encoded by a particular gene. We extend the problem domain to include approximate, rather than exact matching. For gene family knockdown, matching must extend to all members of a target gene set while simultaneously leaving all remaining genes within the genome unmatched. Following exclusion of background genome substrings, we show that the problem naturally reduces to an instance of Set Cover. We transform this instance into a constraint optimization and search for the smallest possible substring set that can represent all members of a target gene family. We extend Simulated Annealing to incorporated parallelism, and present results showing smaller covers determined more efficiently for a parallel implementation in comparsion to a sequential implementation for several combinations of genomes and gene targets. We further speculate on additional ways to incorporate parallelism as a means of extending Simulated Annealing effectiveness.
Date: Friday, June 8th, 2007
Time: 9:00 am
Place: FEC 141
Monique Morin
Committee: Bernard Moret (chair), Stephanie Forrest, David Bader, and Randal (Randy) Linder (UT Austin Biology)
Abstract:
Phylogenetic trees are topological depictions of evolutionary histories which involve birth and death events. This tree model inherently assumes that active lineages evolve independently from all others. However, inter-species events such as hybridization and lateral gene transfer are known to occur resulting in an intermingling of DNA information. In essence, this creates a bridge between two previously independent lineages. By definition, these processes are fundamentally dependent on other contemporary species and lead to a more complex history that is often referred to as a phylogenetic network. While tree topology software is abundant and mature, algorithms and tools that permit and address issues specific to phylogenetic networks are less common and still in their infancy.
This work examines the generation of phylogenetic networks and methods for both their reconstruction and characterization. This required the development of three new tools: NetGen for simulating source topologies, NetReconstruct for inferring histories, and NetMeasure for quantitatively comparing and categorizing network features. Phylogenetic networks are created by extending the traditional birth-death model to include hybridizations. This algorithm is unique in that the DNA sequences are evolved in conjunction with the topology, thereby permitting the outcome of hybrid events to be influenced by genetics. By utilizing the sequence information from NetGen's final active lineages, a proposed history containing a single diploid hybrid is inferred. This reconstruction process segments the species into sets for which subtrees are created and then merged to form the final topology. Finally, hybrid events in a topology can be quantitatively characterized with three new measures and two networks can be compared using the existing tripartition method.
Our results show that tripartition scores for this reconstruction model improve when the number of current day species increase and the branch lengths of the topology shorten. Additionally, topologies containing a single ancient hybridization (one which occurred early in time) result in reconstructions more analogous to their source histories as compared to those with hybrids that have occurred more recently. This work supports the ongoing effort of phylogenetic reconstruction in fields such as pharmacology, genetics, and systematics.
Date: Tuesday, June 5th, 2007
Time: 10:00 am
Place: FEC 141
Frederick Crawford
Committee: Lance Williams (chair), Sean Luan, and Melanie Moses
Abstract:
Previous efforts to measure the relative compression efficiencies of the JPEG and JPEG 2000 lossy image compression algorithms have either been based on easy to compute quantities, which are not well correlated with perceptual quality, or on purely subjective methods. This work attempts to assess the tradeoff between perceptual quality and compression ratio in the two algorithms by measuring the compression ratio at which each algorithm first introduces noticeable differences. This is accomplished using a two-alternative, forced-choice experimental design; subjects are repeatedly presented with pairs of images where one image is either a compressed or uncompressed copy of the other image. We objectively measure the subjects’ abilities to detect differences between the two images as a function of compression ratio. Somewhat surprisingly, our results show that classical JPEG is superior to JPEG 2000 at low compression ratios and that neither mean squared error (MSE), root mean squared error (RMSE), nor peak signal-to-noise ratio (PSNR) are well correlated with just noticeable difference (JND).
Date: Friday, June 1st, 2007
Time: 10:00 am
Place: FEC 141
Eric Gottlieb
Committee: Bernard Moret(co-chair), George Luger (co-chair), Ed Angel
Abstract:
Phylogenetic Trees are commonly compared using the Robinson-Foulds metric. This in turn is normally computed using Day's Algorithm which works on pairs of trees. Other algorithms such as the Fast Algorithm or the Treehash Algorithm have been proposed as alternatives to Day's algorithm. While, at first glance, these algorithms seem disparate, they may all be understood to be variations of a single unifying algorithm and comparisons between them can be made from that perspective.
Date: Friday, May 11th, 2007
Time: 11:00 am
Place: FEC 141
Robert Young
Thesis advisor: Professor Williams
Committee: Professors Williams, Caudell, Luger, and Moses
Abstract:
Illusory contours provide a glimpse of the mechanisms responsible for computation in human vision. We developed two psychophysical experiments to quantify the shape and sharpness of the illusory contour percept elicited by the Koffka cross figure. In the 1st experiment we developed a novel experimental design based on a two dimensional adaptive partitioning, which allowed us to quantify the shape and sharpness of the illusory contour percept based on participants’ responses to the location of a probe points relative to the illusory contour. Because we were unsatisfied with aspects of this experimental design, we developed a second design that addressed its weaknesses. In the second design, we presented the observer with a Koffka cross figure followed by a Koff;ka cross figure with a literal circle or square in the center gap and asked participants if they detected a shape change. This design allowed us to quickly quantify the illusory contour shape percept for a range of parameters. We discovered that our second experimental design could also be used to determine if the computation underlying illusory contour perception is isotropic by comparing shape results for two figure orientations. We found a statistically significant difierence in the figures at the two orientations which provides strong evidence that illusory contour computation is anisotropic.
Date: Monday, May 7th, 2007
Time: 1:00 pm
Place: FEC 141
Mike Ritthaler
Abstract:
The University of New Mexico (UNM) is currently designing and building the CCD Transit Instrument II (CTI-II), a 1.8m transit survey telescope. The stationary CTI-II uses the time delay and integrate readout mode for a mosaic of charge coupled devices (CCD) to generate over 100 gigapixels per night which is required to be analyzed within a day of observation. This work describes the construction and operation of the BACAN (Bayesian Astronomical Clue Analyzing Network) system to help with real-time and near-real-time analysis of the data. This document describes the theory of the Bayesian networks which underlie BACAN, the choices of features which BACAN uses as evidence, and the accuracy of numerous classification topologies with a synthetic data set. An application of self-organizing maps for discretization of continuous variables is also presented. Finally, there is an in-depth discussion of the best topology results with an emphasis on how the reasoning of the network can be shown to the astronomer using it.
Date: Wednesday, May 2nd, 2007
Time: 1:00 pm
Place: FEC 141
Eric Martin
Abstract:
Level set methods have brought many desirable properties to the representation and tracking of arbitrary interfaces at a cost of increased computation. Previous efforts have reduced the cost of the full-matrix method significantly by only updating cell values along a dynamic narrow-band surrounding the interface. The full-matrix method also demonstrates near linear speedup when calculations are performed in parallel on a distributed clustered computing environment. The work expressed herein explores the potential performance increases obtained by combining the narrow-band method with data dependent parallelism. A parallel narrow-band algorithm is developed using MPI (Message Passing Interface) in order to test performance and scalability of general level set methods when applied to very large problems. Comparisons are made between the full-matrix and narrow-band methods in serial and parallel by calculating the motion of interfaces due to curvature-driven flow and an external velocity field. Data shows that significant gains may be made with a limited number of processors dependent upon the size and partitioning of the narrow-band. Algorithmic details are discussed and future work is examined as these advances point to realistic performance increases for level set applications.
Date: Friday, April 20th, 2007
Time: 1:00 pm
Place: FEC 141
John Burge
My primary contributions are twofold. First, I propose a novel application of discrete Bayesian networks—a modeling framework often employed in Machine Learning—to the challenging task of eliciting correlations among neuroanatomical regions of interest from a corpus of functional magnetic resonance imaging data. I demonstrate an application of this method on a dataset consisting of healthy and demented elderly adults, resulting in descriptive models for both populations of subjects. I show how the results of the models can be validated via robustness and confidence testing. Second, I propose three extensions to BN structure search that can be used to improve the results from the demented experiment. These extensions include a method for learning networks which identify correlations that change between classes of data and methods for improving both structure and parameter learning given a hierarchical relationship among modeled random variables. To validate these extensions, I perform a wide variety of experiments on both simulated datasets and five neuroimaging data sets. Overall, I show that incorporating my proposed methods allows for learned models to identify class-discriminative structures more effectively than traditional methods and, for domains with hierarchically related RVs, I demonstrate that higher scoring models can be found with parameters that generalize more effectively
Date: Tuesday, April 10th, 2007
Time: 11:00 am
Place: FEC 141
Torsten Staab
While a variety of software tools for the recovery of information from formatted, overwritten, or deleted computer hard drives and floppy disks already exist, none of these technologies are capable of recovering information from demagnetized magnetic storage media. De-magnetization, which is also known as degaussing, is a physical process in which magnetic storage media is passed through a very powerful oscillating or constant magnetic field. Degaussing is supposed to rearrange the magnetic domains on the media, thereby removing any semblance of the previously recorded signal. Besides physically distorting the original recording, degaussing also results in extremely low recording Signal-to-Noise Ratios (SNR). The SNR after degaussing is far below the sensitivity level of the storage unit's read head and outside its noise filtering and error correction capabilities. Therefore, degaussed (i.e., sanitized) magnetic media can no longer be read by its original recording system. Information recovery from degaussed magnetic storage media is widely considered to be the holy grail of computer forensics.
The objective of the research presented here was to advance the state-of-the-art in computer forensics by modeling and developing novel signal and image processing algorithms for recovering and "descrambling" of residual information patterns from degaussed magnetic storage media. This presentation will provide an overview of these computational data recovery models and algorithms, discuss experimental results, and outline future extensions.
Date: Friday, April 6th, 2007
Time: 11:00 am
Place: FEC 141
Rory L.P. McGuire
Committee: Shuang Luan, Lance Williams, Philip Heintz
Picture Archiving and Communication Systems (PACS) are becoming prevalent in many radiology departments. However, with such a system, the radiographic image retake rate is still between 2.3% and 4.6%. At this rate, tens of millions of dollars may be wasted and diagnoses for millions of patients delayed.
The goal of this research is to explore various ways to automatically qualify diagnostic radiographs based on criteria determined by radiologists. We have developed a prototype real-time radiograph quality assurance system that could significantly reduce the overall x-ray retake rate, lower public exposure to radiation, and implicitly train radiology technicians by providing instant feedback on imaging techniques. This system could also reduce time required by the patient, technician and radiologist, thereby reducing the total cost of treatment.
Our system has two main parts: (1) anatomical site categorization of an image, and (2) registration, learning, and classification of an image for each anatomical site. For the first part, we use mutual information to categorize knee and lung images. For the second part, we focus on knee images. We use the Hough transform, a well-known simple edge detection algorithm; the "eigenfaces" method of calculating the principal components of a set of images and reducing the dimensionality of the search space; and a simple, nearest-neighbor supervised learning and classification algorithm. We believe our system could serve as a springboard for similar systems to qualify medical imaging data of other anatomical sites.
Date: Thursday, April 5th, 2007
Time: 8:30 am
Place: FEC 141
Rob Abbott
Committee: Stephanie Forrest, Chair, Terran Lane, Peter Stone, Department of Computer Science, University of Texas at Austin, Jean-Paul Watson, Computer Science Research Institute, Sandia National Labs
Interactive computer simulation can be an efficient and effective tool for training, but faces several technical obstacles: first, agents in the scenario (simulated allies and opponents) must exhibit realistic tactics; second, the dynamics of the simulation must be valid in order to reinforce tactics which are applicable in reality; and third, the training system should provide corrective feedback when students commit errors. To be practical, a technical approach must accomplish all of these objectives quickly and inexpensively. In this dissertation I develop solutions for these requirements with techniques that enable subject matter experts (SMEs) to create models of their own tactics. Each of the three obstacles is addressed with an approach based on behavior modeling: first, the models provide a way to transfer real-world tactics to software agents. Second, the dynamics of the simulator are validated by measuring the effectiveness of real-world vs. synthetic tactics in the simulator. Third, students using the simulator for training are evaluated by comparing their actions to those predicted by models of SME behavior. Because the models are constructed by an automated process, knowledge engineers and software developers need not mediate between the SME and the student audience. The model itself becomes a medium for indirect educational interactions between the SME and student.
Date: February 21st, 2006
Time: 9:00 am
Place: FEC 141
Sergey Plis
Advances in the development of noninvasive imaging modalities allow us to examine human brain function at various ways and at different spatial and temporal scales. However, each modality provides limited information and no single modality covers the large range of spatial and temporal scales over which neural activity operates. There is considerable interest in combining different modalities to obtain a clearer and more complete picture of brain function. This work is focused on combining and improving information extraction from spatiotemporal data of electroencephalography and magnetoencephalography. Contributions of this thesis include 1) a spatiotemporal noise covariance model for better noise characterization and subsequent improvement of source analysis results, 2) a generalization of this covariance model that improves the running time of the analysis routines while maintaining most of the benefits of the full noise covariance model, and 3) a probabilistic forward model for electroencephalography. This probabilistic forward model explicitly accounts for uncertainties in the skull conductivity measurements and demonstrates a principled approach of incorporating uncertainties of the forward model into the Bayesian analysis and propagating them to the resulting posterior distribution.
Date: December 1st, 2006
Time: 3:00 pm
Place: FEC 141
Kenneth Ingham
Network servers are vulnerable to attack, and this state of affairs shows no sign of abating. Therefore security measures to protect vulnerable software is an important part of keeping systems secure. Anomaly detection systems have the potential to improve the state of affairs, because they can independently learn a model of normal behavior from a set of training data, and then use the model to detect novel attacks. In most cases, this model represents more instances than were in the training data set---this is generalization, and it is necessary for accurate anomaly detection.
This dissertation describes a framework for testing anomaly detection algorithms under identical conditions. Because quality test data representative of today's web servers is not available, this dissertation also describes the Hypertext Transfer Protocol (HTTP) request data collected from four web sites to use as training and test data representing normal HTTP requests. A collection of attacks against web servers and their applications did not exist either, so prior to testing it was necessary to also build a database of HTTP attacks, the largest publicly-available one.
These data were used to test nine algorithms. This testing was more rigorous than any performed previously, and it shows that the previously-proposed algorithms are not accurate enough for production use on many of the web servers in use today, and might explain the lack of their widespread adoption. Two newer algorithms (deterministic finite automaton induction and $n$-grams) show more promise.
This dissertation shows that accurate anomaly detection requires carefully controlled generalization. Too much or too little will result inaccurate results. Calculating the growth rate of the set that describes the anomaly detector's model of normal provides a means of comparing anomaly detection algorithms and predicting their accuracy. Identification of undergeneralization locations can be automated, leading to more rapid discovery of the heuristics needed to allow an anomaly detection system to achieve the required accuracy for production use.
Date: November 7th, 2006
Time: 10:00 am
Place: FEC 141
Soraya Abad-Mota
Organizations produce large numbers of documents. Often the contents of these documents do not reach the operational databases or data warehouses of the enterprise. With the world-wide accessibility to the web these documents are made available to a wide audience, but browsing through them manually is cumbersome, at best. The definition of the semantic web has led to fascinating possibilities of research in trying to make explicit the semantics of terabytes of unstructured data available today. Our research seeks to improve the use of these unstructured sources, in particular, textual sources.
We present an architecture for structuring and querying the contents of a set of documents which belong to an organization. The structure is a database which is semi-automatically populated using information extraction techniques. A language to interrogate the contents of the documents is provided and it is based on an ontology. The processing of queries in this language can provide approximate answers and triggers a mechanism for improving the answers by searching additional sources. Individual database items have associated quality metadata that can be used in the assessment of answers' quality. The interaction between information extraction and query processing is a pivotal aspect of this research.
Date: November 7th, 2006
Time: 10:00 am
Place: Dean's Office Conference Room
Ron Hospelhorn
Multi-scale image processing frequently serves as the first step in scene analysis, pattern recognition, texture analysis, image reconstruction, and many other early vision tasks. A pyramid architecture analyzes an image independently at several scales, and a steerable pyramid can extract oriented features at each of those scales. Hexagonal sampling of image data promises processing economies over rectangular sampling as well as higher angular resolution. This thesis discusses the implementation of filters for hexagonally sampled data and uses them in two steerable pyramid designs.
Date: November 3rd, 2006
Time: 2:30 pm
Place: FEC 141
Jin Xiong
Digital fulldome theaters provide an immersive environment for educational and scientific purposes. The architecture of the new generation of digital fulldome systems enables more flexibility in creating fulldome shows. The adoption of real-time rendering provides even more versatility. As an exploration of future applications of fulldome systems, we successfully implemented dome versions of the Pong game and a fractal generator. These research works exhibit the potential interactive capability of fulldome systems.
In this thesis, first we introduce the architecture of current fulldome theater systems and the process of presenting a fulldome show. We then explore possible approaches to real-time rendering in multi-projector system as a general case and the fulldome theater system as a special case. Finally we describe the implementation of interactive real-time applications that are based on the cluster architecture of fulldome theater systems.
Date: Friday, August 25th, 2006
Time: 8 am
Place: FEC 141
Shibin Qiu
Many bioinformatic problems require effective pattern recognition and machine learning algorithms. Kernel methods are powerful approaches for data classification, prediction and description. In this work, we develop kernel learning methods for bioinformatics data by introducing novel string kernels and unifying multiple kernels, and apply them to problems in RNA interference (RNAi), a cutting edge technology with many biological and pharmaceutical applications. We develop RNA string kernels to model RNA sequence similarities by simulating mismatch, G-U wobble, and bulge patterns. For the first time, we give RNAi a mathematical formalism using RNA kernels. To systematically quantify the off-target effects of RNAi, we define an off-target error rate and evaluate it using RNA kernels. Since computing the error rates requires large, transcriptomic scale searches, we develop efficient implementations for the RNA kernels, which have achieved speedups of six orders of magnitude. The resultant off-target error rates in multiple organisms characterize the quantitative relationship between the error rates and RNAi parameters, are consistent with biological findings, provide guidelines for RNAi designs, and project error rates for future RNAi biological applications. To further improve upon the RNA kernels, we propose a group of randomized string kernels and convert a random Hamming distance into a string kernel. For the first time, we use support vector regression to predict the exact silencing efficacy scores for siRNA sequences. We employ the RNA and randomized string kernels in regression prediction and have achieved better accuracies on biological data sets than do existing string kernels or numerical kernels computed from efficacy rules proposed in the RNAi literature. To better capture variations of the data, we develop a multiple kernel regression technique that unifies multiple kernels via a quadratically constrained quadratic programming (QCQP) formulation. Results demonstrate that using more than one kernel simultaneously improves prediction accuracy and reduces the number of support vectors, turning a complicated problem (requiring more support vectors) into an easier one. For simplification of the multiple kernel framework, we propose several heuristics based on the fitness between a kernel and a label set. Applying the heuristics to the siRNA efficacy problem shows that combining string kernels with numerical kernels from efficacy rules significantly improves prediction accuracy. The multiple kernel framework also estimates the relative importance of the kernels representing different siRNA efficacy rules, enabling RNAi designers to focus on important rules.
Date: Friday, August 11th, 2006
Time: 9 am.
Place: FEC 141
James Foucar
A garbage collection trace (GC-trace) enumerates all the events that happen during program execution that affect the heap. Therefore, GC-tracing is an excellent way to examine the behavior of an object-oriented system. Knowing behavior expedites many forms of virtual machine research including garbage collection research. GC-traces were originally generated by the painfully slow brute-force technique. Subsequently, this technique was replaced by the Merlin algorithm which can generate GC-traces at a fraction of the cost of the brute-force algorithm.
Our work introduces a new GC-tracing system. Our system uses the Merlin algorithm within the context of a general shadow heap that is independent of the virtual machine. The shadow heap supports many optimizations to the Merlin algorithm and also supports analysis and verification. We examine the advantages and disadvantages of our approach, the viability of using a C-library for analysis in conjunction with a Java-in-Java virtual machine, the various costs of integrating our shadow heap with a virtual machine, and the effectiveness of our GC-tracing optimizations. Finally, we compare our GC-tracing system with the GC-tracing system that is currently bundled with JikesRVM.
Date: Thursday, July 20th, 2006
Time: noon
Place: FEC 141
Aaron Clauset
The discovery that the topology of real-world networks differs dramatically from the idealized structures of graph theory has, in recent years, produced a flood of interest in understanding and characterizing the complexity, origins and utility of these differences. However, linking this knowledge with practical ends, such as designing new Internet communication protocols, manipulating metabolic networks, or predicting future behavior depends quite strongly upon the quality of the tools and methods we use to characterize the underlying network structure.
These techniques, frequently borrowed from graph theory, statistical physics and sociology, tend to be statistical in nature, primarily because the configuration space is extremely larger, even a network of moderately size. As such, network structure is often summarized as a distribution or an average of some interesting quantity. It is now clear, however, that these simple characterizations can fail to capture much of the governing structure.
For instance, much effort has recently been spent indirectly characterizing the structure of the router-level Internet via the paths taken by packets crossing the network. The first two chapters of this thesis explore, through analytics and numerical simulation, the fundamental bias inherent in this technique, showing that it can produce power-law degree distributions where none actually exist. We further show that the severity of the bias is related to the excess of the underlying network and remains until nearly all vertices in the network participate in the mapping effort.
Unlike the case of the router-level Internet, we often have fairly high-quality maps of the pair-wise topological structure of real-world networks. In this case, the concise summarization of the large-scale topological structure remains an important, yet difficult and time-consuming task. In the middle two chapters, we describe two fast heuristics for the inference of community structure in real-world networks, and demonstrate that they can summarize, in an automated fashion, the large-scale structure of very large networks, i.e., n ≈ 106 or larger.
In the final two chapters, we describe a very general model of network organization, based on the observation that networks often exhibit a hierarchical structure, in which the vertices may be recursively grouped into coherent clusters. We show that this model, called a hierarchical random graph (HRG), can be used to automatically annotate the structure of a graph, assigning weights to edges, assigning group-association strengths to vertices, and extracting the minimal set of hierarchical features that explain the structure of the original graph. We further show how HRG models can be used to predict missing network data with a high degree of accuracy, show that HRG models can be used to generate sophisticated null-models for testing network hypotheses, and can naturally explain the presence of non-trivial structure such as heavy-tailed degree distributions, high clustering coefficients and short paths between pairs of vertices, all hallmarks of complex networks.
Date: Wednesday, July 12th, 2006
Time: 10 am
Place: EC&E 118
Ryan Custer
This research identifies how the extensible rights expression language (XrML) can be used to provide support for information critical to the protection of national security, also know as classified information. A list of requirements is developed that identify features that XrML must provide to effectively express how classified content should be managed. The XrML elements that can be used to satisfy many of the requirements are identified and an extension to XrML is developed that satisfies the remaining requirements. This provides a “roadmap” of how XrML can be used in a larger DRM environment that will provide enhanced security and support for the proper management of classified content on digital information systems.
Date: Friday, June 9th, 2006
Time: 9:30 am
Place: FEC 141
Sheela Vats
Location aware services can be used for advertising, guiding tourists, locate a store or restaurant, finding the wait time of a restaurant, or checking the flight schedule of an airline in an airport. The main problem associated with getting location specific service is that the provider gets the client's profile and other details and thus detrimental privacy. In this thesis, we present the design and implementation of an extensible architecture for privacy-preserving location-aware client-server interaction. Our solution is based on pseudonyms and SOAP technologies for client server interactions. In this solution, we create a middleware between the client and service providers, which works intelligently for its clients. We also describe a protocol for how the wireless client can interact with the middleware so that it can utilize the services of different providers.
Date: Wednesday, May 24th, 2006
Time: 9am
Place: FEC 141
Anuj Gupta
Dynamic Bayesian networks (DBN) are used in a number of time series modeling applications including user modeling, financial modeling, and fault detection. DBNs are constrained by their geometric state duration distribution. Many time series, including those generated by user activities, do not have a geometric distribution. By incorporating semi-Markov property into a DBN, we can model any arbitrary state duration distribution. This thesis introduces an efficient inference algorithm for semi-Markov DBNs. The thesis proposes an extension to the junction tree inference algorithm, to perform simultaneous belief and constraint processing. It uses this hybrid inference for DBNs with multi-valued categorical variables, specifically DBNs where determinism is present due to the semi-Markov property. Inference in a semi-Markov DBN is reduced from O(Tnd) to O(Tnd-k), where d is the total number of variables in the largest clique, n is the cardinality of each variable, T is the total number of time slices, and k is the number of variables constrained to take on deterministic values. Experiments conducted on multiple semi-Markov DBNs verify the theoretical result.
Date: Wednesday, May 10th, 2006
Time: 10am
Place: FEC 141
Keith Wiley
The state-of-the-art in computer drawing programs is based on a number of concepts that are over two decades old. One such concept is the use of layers for ordering the surfaces in a 2 1/2 D drawing from top to bottom. A 2 1/2 D drawing is a drawing that depicts surfaces in a fundamentally two-dimensional way, but also represents the relative depths of those surfaces in the third dimension. Unfortunately, the current approach based on layers unnecessarily imposes a partial ordering on the depths of the surfaces and prevents the user from creating a large class of potential drawings, e.g., of Celtic knots and interwoven surfaces.
The first half of this dissertation describes a novel approach which only requires local depth ordering of segments of the boundaries of surfaces in a drawing rather than a global depth relation between entire surfaces. Our program provides an intuitive user interface with a fast learning curve that allows a novice to create complex drawings of interwoven surfaces that would be extremely difficult and time-consuming to create with standard drawing programs.
The second half of this dissertation describes a previously unrealized topological property of 2 1/2 D scenes. Knowledge of this property makes possible the design of algorithms for manipulating 2 1/2 D representations in a way that is isomorphic to elemental 2 1/2 D scene changes. Furthermore, this property can be exploited to vastly improve the performance of a 2 1/2 D scene editor.
Date: Monday, May 8th, 2006
Time: 3pm
Place: ME 427
Erik Lee
Reverse engineering of executables is a task which demands significant intuition and experience of its practitioners. Reconstruction of source code from compiled binary code is in general an undecidable problem. However, with knowledge of the algorithms and tools used to generate the executable, a significant amount of information can be obtained from its analysis. This thesis discusses several techniques and novel combinations of known techniques to approach the problem of reverse engineering unknown executables from a data structure perspective rather than from the much more prevalent code structure perspective. This research enhances the ability of automated tools to discover and use information about binary executables.
Date: Friday, April 28, 2006
Time: 11am
Place: FEC 141
Maxwell Reeves Young
Chord is a distributed hash table (DHT) that requires only O(log n) links per node and performs searches with latency and message cost O(log n), where n is the number of peers in the network. Chord assumes all nodes behave according to protocol. We give a variant of Chord which is robust with high probability for any time period during which: 1) there are always at least z total peers in the network for some positive integer z; 2) there are never more than (1/4-\epsilon)z Byzantine peers in the network for a fixed epsilon>0; and 3) the number of correct peer insertions and deletions is no more than z^k for some tunable parameter k. We assume there is an computationally unbounded adversary controlling the Byzantine peers and that the IP-addresses of all the Byzantine peers and the locations where they join the network are carefully selected by this adversary. Our notion of robustness is rather strong in that we not only guarantee that searches can be performed but also that we can enforce any set of "proper behavior" such as contributing new material, etc. In comparison to Chord, the resources required by this new variant are only a polylogarithmic factor greater in communication, messaging, and linking costs.
Date: Friday, April 7th 2006
Time: 3:30pm
Place: FEC 145
Moulik Kothari
Using gene order data for phylogeny reconstruction gives information that ordinarily is not available with DNA data. However, using gene order data highly increases the complexity of phylogeny reconstruction programs and introduces questions much different from DNA data. While algorithms have been developed to solve some of the problems, many more need to be solved. Also, gene order data are not as commonly available as DNA data. Research has focused on using inversion (reversals) distances and breakpoint distances for reconstructing phylogenies. While inversions are commonly found in organisms, breakpoint distance is a mathematical abstraction of pairwise differences between two genomes. Transpositions have been proven to occur within orgranisms and more commonly found in fast evolving organisms like Mitochondria. However, no detailed study has been carried out using transpositions to reconstruct phylogenies. Our aim is to explore use of transposition data to reconstruct phylogenies. Two new distance algorithms that can work with transposition data are implemented. We evaluate these algorithms alongwith an inversion distance algorithm on genome data generated entirely using transpositions to reconstruct phylogenies. We compare the results on different measures to study the behaviour of each method under different circumstances.
Date: Friday, April 7th 2006
Time: 2:30pm
Place: FEC 141
I-Ching Boman
Connectivity, the existence of a path between every pair of nodes, is an important aspect of many networks. Some examples where connectivity is paramount are: communication networks like P2P systems and wireless sensor networks, utilities networks like the power grid and the water supply, and infrastructures networks like roads.
Deletion of a node in a network can occur as a result of failures, sabotage, or planned shutdowns as part of energy conservation. As a result of node deletion, the network may become disconnected. One approach for maintaining connectivity is to structure the network in anticipation of these possible disruptions. Our approach is to fix it as we go. We do this by adding new edges to the network to maintain connectivity whenever they are needed.
In our model, the network is an undirected graph, G(V,E), consisting of a set of nodes and m edges. Assume G is connected initially. At each time step, an adversary can remove nodes from the graph. We will describe two versions of an algorithm, which we dubbed the "line" algorithm. The "line" algorithm adds edges to maintain connectivity. In addition, it is locality-aware, i.e. new edges are added only to those vertices that were not far apart before the deletion.
Date: Friday, April 7th 2006
Time: 9:15am
Place: FEC 141
Ravi kiran Gorrepati
Of all 64-bit architectures, IA-64 is the newest, and it offers exciting new features for compiler developers. Its features place it in the higher end of the RISC architecture spectrum. Since this architecture is very different from other extant architectures, it offers a lot of room for research. IA-64 gains speed from efficient compilation, which for Java is costly for the compilation of modules happens dynamically on the virtual machine. Lack of an open source Java virtual machine for the IA-64 which allows for research on code generation techniques lead us to pursue this work. This work presents a simple code generator for the IA-64 architecture. This code generator is a part of the Jikes Research Virtual Machine for Java written in Java. To our knowledge, this work presents the first open source, just in time virtual machine for Java, on the IA-64 architecture.
Date: Wednesday, April 5th 2006
Time: 11 am
Place: FEC 141
Santosh Bangera
The peridynamic model [1], introduced by Stuart Silling at Sandia National Laboratories, is a promising model from a conceptual as well as a computational point of view for simulating the behavior of plain and reinforced concrete structures [2]. The primary objective of this thesis is to predict and model the failure and deformation of two-dimensional linear elastic structures using a quasi-static peridynamic modeling method. In order to achieve this, a new computational framework is developed and implemented using Java for solving plane stress and plane strain structural analysis problems. A new modeling approach is implemented, which involves a topological method for independently mapping all domains in a problem, thereby enabling the application of peridynamic theory to a high fidelity structural model. This helps in improving the computational efficiency as well as in overcoming the shortcomings of earlier attempts. A system is implemented that allows the user to differentiate various boundary conditions, track development of cracks, and analyze the deformed structure behavior. Also implemented is a mechanism to give feedback on the computed displacement and internal force components acting on every subnode of a material structure.
[1] - Silling S. A., “Reformation of elasticity theory for discontinuous and long range forces”, Tech. Report SAND98-2176, Sandia National Laboratories, Albuquerque, NM, 1998.
[2] - Walter Gerstle, Sau N., and Silling S. A., "Prediynamic modeling of plain and reinforced concrete structure", 18th International conference on Structural Mechanics in Reactor Technology (Beijing, China), August 7-12 2005, 949-956.
Date: Tuesday, April 4th 2006
Time: 8 am
Place: FEC 141
Sreejith Vijayan
Currently, drug design is a time consuming and expensive endeavor; the development of a single new drug requires about $800 million and between 10 and 20 years. It is estimated that for every 10,000 compounds, only 1 is successfully tested in clinical trials. The binding of Leukocyte Functional Antigen-1(LFA-1) to Intercellular Adhesion Molecule (ICAM-1) is important in a variety of diseases and blocking their interaction may have therapeutic impact. Thus, we wanted to employ computational technologies to the problems of developing antagonists of (LFA-1)/ (ICAM-1) binding.
The goal is to determine the SAR (Structure Activity Relationship) of 12 peptides (4 each with high affinity, intermediate and low affinity) using a combination of experimental and computational techniques. The results generated from these models will be used to predict non-peptide organic structures that will be tested experimentally. Comparison of the computational predictions and experimental data will assist in the development of novel computational technologies in the future directed at drug discovery.
I tested the hypothesis that computer-based technology can more rapidly and accurately predict the structural relationship between peptides and ICAM/LFA-1 that can be used in drug discovery. These sorts of studies are not performed extensively since large sets of chemically-related compounds are not available. I took advantage of a 31 member set of compounds, 12 of which had their 3-dimensional structure solved chemically. This set of chemical set is unique and takes many man-years to develop.
Date: Friday, December 9th 2005
Time: 11am
Place: FEC 245
Jake Proctor
Intensity Modulated Radiation Therapy (IMRT) is a modern cancer treatment technique. IMRT is mainly implemented using a clinical linear accelerator (LINAC) equipped with a multileaf collimator (MLC). The LINAC is used to generate the radiation beams for the treatment. The MLC, which is composed of multiple pairs of configurable metal leaves that form an aperture, is used to control patient exposure to the radiation. An IMRT treatment plan is a set of MLC formed apertures. Because the total delivery time of a treatment plan is largely determined by the total number of MLC apertures in the plan, the key to IMRT delivery planning is to convert prescribed continuous or discrete radiation intensity distributions into a minimal number of MLC apertures.
In this thesis we develop a new algorithm to convert continuous intensity distributions into a minimal number of MLC apertures. Compared to the Pinnacle treatment planning system by Philips Corp., our new algorithm can reduce the number of apertures by as much as 75%.
Date: November 7th 2005
Time: 1:30 - 3:30
Place: FEC 141
Galen M. Shipman
gshipman@cs.unm.edu
Infiniband is becoming an important interconnect technology in high performance computing. Recent efforts in large scale Infiniband deployments are raising scalability questions in the HPC community. This work in Open MPI provides several mechanisms to enhance Infiniband scalability. Initial comparisons with MVAPICH, the most widely used Infiniband MPI implementation, show similar performance but with much better scalability characteristics. Specifically: small message latency is improved by up to 10% in medium and large jobs; memory usage per host is reduced by as much as 300%; latency is close to optimal IB send/receive latency and bandwidth performance is similar to that of MVAPICH.
Date: November 4th 2005
Time: 9:30 AM
Place: FEC 141
Fernando Esponda
fesponda@yahoo.com
In a negative representation, a set of elements (the positive representation) is depicted by its complement set (the negative representation). That is, the elements in the positive representation are not explicitly stored, and those in the negative representation are. The concept, feasibility, and properties of negative representations are explored in this dissertation. Two approaches are taken: First, one in which the negative representation is an inexact depiction of the positive set i.e. not all possible items are characterized, with applications to machine learning and anomaly detection. Second, one where the negative representation depicts exactly all the items not in the positive set. It is shown that a positive set consisting of n, l-bit strings can be represented negatively using only O(ln) strings, through the use of an additional symbol. It is also shown that membership queries for the positive representation can be processed against the negative representation in time no worse than linear in its size, while reconstructing the original positive set from its negative representation is an NP-hard problem. Some additional properties, such as the differences in inferring answers from either set, are discussed in depth, and applications that address some of the privacy concerns of the day are outlined.
Date: October 18, 2005
Time: 9:00 a.m.
Location: High Performance Computing.
Christopher Davis
chris2d@cs.unm.edu
This paper outlines, discusses, and presents several artificial
neural network architectures that are amenable to
Graphical Processing Unit execution. Traits are identified that lead to
good or poor performance of the artificial neural network simulation.
Date: July 22
Time: 1:00 p.m.
Location: FEC 141.
Hajime Inoue
hinoue@cs.unm.edu
In the past few years languages which run on virtual machines, like Java and C#, have become popular. These are platforms as well a languages, and they are characterized by being verifiable and garbage collected, and include Just-In-Time compilers, large standard libraries, and runtime profilers. I call platforms with these features dynamic execution environments (DEEs).
The runtime infrastructure of DEEs allows access to features of execution that were previously difficult to observe. My research consists of a series of case studies in which I build systems to classify behavior of a particular feature into normal and abnormal and then use that classification for either security or optimization purposes. These systems are anomaly detectors.
I build anomaly detection systems for method invocations, method invocation sequences, and Permissions. Used to detect intrusions or system faults, I call them dynamic sandboxes. I also show that an anomaly detector can be used to predict object lifetimes resulting in an improved garbage collector.
Date: July 8
Time: 11:00 am.
Location: FEC 145
Vishal Sanwalani
<vishal@cs.unm.edu>
In this talk we discuss two problems. The first problem concerns the probability
a sparse random graph or hypergraph is connected. While it is exponentially
unlikely that a sparse random graph or hypergraph is connected, with probability
1-o(1) such a graph has a giant component that, given its numbers of edges and
vertices, is an uniformly distributed connected graph. This simple observation
allows us to estimate the number of connected graphs, and more generally the
number of connected d-uniform hypergraphs, on n vertices with o(n ln n) edges.
The second problem concerns electing a leader from n candidates (players). Specifically, in the leader election problem, there are n players, of which some fraction are faulty. We assume that an omniscient and computationally unbounded adversary picks which players will be faulty before the game begins and that this adversary controls the actions of all faulty players. We present a leader election protocol which is scalable in the sense that it requires each player to send and process a number of messages which is polylogarithmic in n and requires a number of rounds which is polylogarithmic in n. Provided that the fraction of faulty players is less than 1/3, our protocol elects a non-faulty leader with constant probability and ensures that a 1-o(1) fraction of the players know this leader.
Date: July 5
Time: 11:00 am.
Location: FEC 141
Vamsi K Kalapala
vamsi@spruance.cs.unm.edu>
We present two results on networks, first, we present a result on a phase
transition in the 3-bit Exact Cover Problem (EC3), which can be viewed as a
restricted form of 2-coloring of random 3-regular hypergraphs. Second, we present
our results on scale invariance in road networks.
We study 3-bit Exact Cover Problem (EC3), also known as Positive 1-in-3 SAT.
Random instances of this problem have been used as a benchmark for simulations
of an adiabatic quantum algorithm. Empirical results suggest that EC3 has a
transition from satisfiability to unsatisfiability when the number of clauses
per variable $r$ exceeds some threshold $r^*$. This problem is unusual
in that numerical experiments can determine its threshold to a greater accuracy
than for other variants of 3-SAT; by performing exhaustive search on problems
of up to 1700 variables we estimate that $r^* \approx 0.62 \pm 0.01$. Recently
it was shown that $r^* \le 0.644$. Using the method of differential equations,
we place a lower bound on $r^*$'s location by showing when $ r < 0.546$ w.h.p.
a random instance of EC3 is satisfiable.
We study road networks as complex networks with an underlying geography. We
find that these networks have topological and geographical scale invariant properties.
We employ both primal and dual models of the network. In the primal model, intersections
form the nodes and road segments form the edges. In the dual model, roads form
the nodes and road intersections form the edges. In the primal model, we find
that journeys of widely varying lengths had a similar structure. In the dual
model, the degree distribution of the network is a power law indicating a scale
free topology. We give a simple fractal model that produces these properties.
Date: July 5th
Time: 9:00 a.m.
Location: FEC 141
Meenakshi Balasubramanian
<meenkoot@cs.unm.edu>
Scalable Synthetic Compact Applications (SSCA) #2 is a graph theory benchmark developed under DARPA’s High Productivity Computing Systems (HPCS) project. SSCA #2 is a part of the growing set of SSCA benchmark suite which currently includes SSCA #1-A Bio Informatics benchmark, SSCA #2-A Graph theory benchmark and SSCA #3-A Sensor Processing and Knowledge Formation benchmark. The intent of SSCA #2 is to develop a compact application having multiple kernels and accessing a single data structure that represents a directed multigraph with labeled edges. This synthetic benchmark consists of four kernels requiring irregular access to the graph’s data structure. We have developed a parallel implementation of this benchmark using Unified Parallel C (UPC), a parallel version of ANSI C having a distributed shared memory programming model layout. In this thesis we will discuss the design and implementation of each of the four kernels and also present the benchmark’s execution time and the validation results.
Master's Thesis Defense: A Per-Request Performance Monitoring
Framework for the Linux Kernel
Date: July 1, 2005
Time: 10:00 a.m.
Location: FEC 141
Sushant Sharma <sushant@cs.unm.edu>
Department of Computer Science
Linux currently plays an important role in high-end computing systems, but
recent work has shown that Linux-related processing costs and variability
in network processing times can limit the scalability of HPC applications.
Measuring and understanding these overheads is thus key for future use of
Linux in large scale HPC systems. Unfortunately, currently available performance
monitoring systems introduce large overheads, performance data is generally
not available on-line or from the operating system, and the data collected
by such systems is generally coarse-grained. The work done here presents a
low-overhead framework for solving one of these problems: making useful operating
system performance data available to applications at runtime. Specifically,
Linux Trace Toolkit(LTT) is enhanced to monitor the performance characteristics
of individual system calls and to make per-request performance data available
to applications. The presented framework has the ability to monitor individual
network and disk requests, and it has been demonstrated that the overhead
of the per-request performance monitoring framework is minimal. We have demonstrated
this by providing various benchmark results, which clearly shows that the
overhead of the framework varies from 2-9%.
Master's Thesis Defense: Games, Strategies, and Boolean Formula Manipulation
Date: June 30, 2005
Time: TBD
Location: FEC 141
Ben Andrews <bandrews@cs.unm.edu>
Department of Computer Science
This work describes the formalization of the process of converting strategies for two-player games of perfect information into Boolea formulas. This work also describes the development of some special purpose algorithms for manipulating Boolean formulas. While many of the concepts in this thesis are explained with specific implementations in mind, the ideas and formalizations are implementation-independent. In particular, this document describes the application of many of the concepts to real-world problems arising from ongoing molecular computation research.
Master's Thesis Defense: RIOT: A Responsive System for Mitigating Computer Network Epidemics and Attacks
Date: June 27, 2005
Time: 10:00 a.m.
Location: FEC 141
Justin Balthrop <juddhuck@gmail.com>
Department of Computer Science, UNM
Current strategies for the prevention of computer network epidemics and attacks fall into two general categories and have two main shortcomings. The two most popular commercial security measures are firewalls and antivirus software. These systems can be effective at stopping known viruses and attacks, but they are not adaptive and they require human intervention to setup firewall rules or create virus signatures. Thus, they are unable to respond to new methods of attack and infections from viruses that have never been seen before. A promising solution to this problem has been recent work on systems that limit the rate at which computers can make new network connections, known as rate limiters. However, prior work in this area has largely focused on preventing specific viruses or worms, and it has not been applied more generally to all attacks that involve high connection rates.
In this work, I address a wider variety of attacks that are characterized by high connection rates. A worm generally involves one machine attempting to connect to many different machines, but many high rate attacks have a different connectivity pattern from worms. A brute force password cracker, for example, involves many connections between just two machines, while a distributed denial-of-service attack involves many machines connecting to a single machine at the same time. The effectiveness of the majority of these attacks would be greatly reduced if their connection rates were slowed, but current rate limiting methods are not sufficiently general to do this.
In this thesis, I explain why a general system for limiting connection rates is our best chance for preventing widespread network epidemics and attacks, and then I describe the design of such a system. This system is called RIOT, and it limits the rate of all network interactions on the protected computer, allowing it to counter a wide range of attacks, both incoming and outgoing. The system groups different connections using detectors, learns the normal connection rates for each group of connections, and automatically drops packets if the rates exceed the learned level. This proves to be effective at dropping packets associated with attacks while not interfering with normal usage. I have implemented this system on both Linux and Mac OS X and provide the details of each implementation. I also report experimental results and users' experiences from running the system on multiple workstations.
Master's Thesis Defense: Reduced Latency Searches for Unstructured Peer-to-Peer Networks
Date: June 17, 2005
Time: 10:00 a.m.
Location: FEC 145
John Alphonse <johnalp@cs.unm.edu>
Department of Computer Science, UNM
Over the last few years, peer-to-peer networks have been developing explosively. A peer-to-peer network is a network in which each node plays the role of both server and client. The primary service in most peer-to-peer networks is storing and looking up data items. Optimizing search operations has been one of the focal points of research in this area. In this thesis, we present algorithms for storing and looking up data in unstructured peer-to-peer networks. Our algorithms are efficient in terms of both latency and bandwidth. Our algorithm for storing a data item establishes references to the data item in a subset of sqrt(nlnn) peers selected uniformly at random from all the peers in the network. Our algorithm for searching contacts an arbitrary subset of O(sqrt(nlnn)) peers. We show both analytically and empirically that with high probability even very rare data items can be found with our algorithms. Our algorithm for storing the data item, makes use of a new technique for choosing several nodes uniformly at random from an unstructured network.
Master's Thesis Defense: Approximate Bottom Line DEE: Hybrid Bottom Line Dee and Split Dee Committees
Date: June 10, 2005
Time: 10:00 a.m.
Location: FEC 145
Sung-hee Lee <slee1005@unm.edu>
Department of Computer Science, UNM
Side-chain structure prediction is an essential component of protein modeling and is an important subtask in the protein-folding problem. Dead End Elimination (DEE) is a powerful method to determine the global minimum energy conformation (GMEC) of a set of side chains, and to remove a subset of the possible combinations called rotamer.
Rotamers are set of side-chains defined by discrete Chi angles \cite{Ghiglia:1991}. The DEE process reduces the problem of size and computation time of overall side-chain prediction by reducing the combinatorial size of the problem \cite{Desmet:1992}.The algorithm has been studied since the early 1990's, and several different versions of DEE have been developed.
The most recent method is the Split DEE method by Desmert and Mayo \cite{Pierce:2002}. It divides the search area into partitions and applies Goldsteins DEE to each partition.
Bottom line DEE is theoretically the most powerful method but its computational time cost is high, prohibiting effective implementation \cite{Pierce:2002}. DEE methods determine the global energy minimum by defining those rotamers that are of consistent lowest-interaction energy with the side-chains. Then it is making a comparison to another rotamer using a lowest possible pairwise interaction energy calculation.
The main idea behind Approximate Bottom Line DEE is to scan all possible testing positions, and to retain only those rotamers with the lowest pairwise interaction potential energy. At the end of a single iteration, \emph{a set} of side chains is removed except a rotamer that contains lowest potential energy.
Approximate Bottom Line DEE branch from Bottom Line DEE and Split DEE. Experiment ran on both simulated data and real protein data. Result from using simulate data shows the speed improvement and result from using real protein data proves the accuracy of Approximate Bottom Line DEE.
Approximate Bottom Line DEE performs better optimization than other established methods including Original DEE and Goldstein DEE. The cost of Bottom Line DEE is theoretically larger than $O(n^p)$ \cite{Pierce:2000}, with $n$ representing the number of residue and $p$ the average number of rotamers on residues. The cost of Approximate Bottom Line DEE is $O(n^p)$, at worst case. My experiments represented in this thesis, I have sought to show that Approximate Bottom Line DEE is a viable approximate to that method, one that still runs efficiently.
Master's Thesis Defense: Distplot: Web-based Graphical Tool for Sequence Distance Analysis
Date: June 3, 2005
Time: 2:30 p.m.
Location: FEC 141
Ashish Agarwal <ashisha@cs.unm.edu>
Department of Computer Science, UNM
Many variable viruses are extensively studied because they are responsible
for causing major epidemics. These studies generate extremely complex and richly
interconnected data. Visualization of such biological data is an important
aspect of the research performed by biologists. Efficient and accurate visualization
tools can help biologists identify patterns and
relationships in this data.
Distplot was developed to study the Hepatitis C Virus (HCV), but can be used to analyze sequence data from any organism. It can help researchers study the evolution of gene sequences by visualizing data. None of the existing graphing tools provide the flexibility and ease of conducting an analysis similar to that provided by Distplot.
Distplot collects gene sequence data and information interactively in multiple steps through a web-browser. The input gene sequence data can be be grouped on several levels, such as patient number, sample number, tissue type, genotype, subtype, sample year, etc
The tool can compute pairwise distances between all sequences within each group,
or between all sequences of pairs of groups, and generate a graph representing
these distances. Users can also nest on one of the levels and indicate any level
as the numeric parameter. The tool can plot the arithmetic mean of distances
for each group or group-pair.
Also, users can customize the visual appearance of the graph by modifying the
page dimensions, symbol size, symbol color, symbol type, etc.
Distplot eliminates the need to transform data into multiple forms, providing ease of experimentation. It is very flexible and provides a high degree of customization. The results of the analysis are generated in pdf, postscript, png and plain text format.
Distplot aims to be a resource for scientists working on genetics, evolution, variability, and vaccine and drug design.
Master's Thesis Defense: Machine Learning Approaches for SIRNA Efficacy Prediction
Date: Monday, May 23, 2005
Time: 2:00 p.m.
Location: FEC 141
Sahar Abubucker <sahar@cs.unm.edu>
Department of Computer Science, UNM
RNA interference (RNAi) studies are being widely used to study gene expression
and gene regulation via selective gene knockdown experiments, which are important
for functional genomic studies. RNAi refers to the biological process by which
short interfering RNA (siRNA) degrades complementary messenger RNA (mRNA) sequences.
This knockdown of mRNA prevents it from producing amino acid sequences that
are responsible for gene expression. Recent studies indicate that all siRNAs
do not produce the same knockdown effects. Due to the high cost of synthesizing
siRNAs, rational siRNA design---a priori prediction of functionality for specific
siRNAs---is a critical task for practical RNAi studies. Therefore, a key component
of RNAi applications is the selection of effective siRNA sequences---ones that
are highly functional in degrading more than 80\% of the targeted mRNA. The
goal of siRNA efficacy prediction is to aid indesigning siRNA sequences that
are highly efficient in degrading target mRNA sequences.
Most of the current algorithms use positional features, thermodynamic properties
and secondary structure predictions to predict the knockdown efficacy of siRNAs.
In this work, frequently occurring patterns in siRNAs are used to make predictions
about their efficacy. Time after transfection is shown to be an important factor
in siRNA efficacy prediction---a feature that has been ignored in previous efficacy
studies. The relevance of these
extracted patterns to previous results and their biological significance are
analyzed. The ability of random feature sets to predict efficacy are studied
and the results are discussed. Our algorithm consistently performs better than
other publicly available efficacy prediction algorithms and does not require
any specialized hardware.
Master's Thesis Defense: "FINDMODEL: A Tool to Select the Best-Fit Model of Nucleotide Substitution"
Date: Friday, May 20, 2005
Time: 10:30 a.m.
Location: FEC 141
Ning Tao <ntao@unm.edu>
Department of Computer Science, UNM
Motivation: Choosing a model of sequence evolution is a crucial step when using DNA sequence data to reconstruct phylogenies: using a mismatched model will reduce accuracy and may lead to erroneous conclusions. FINDMODEL is a web-based tool for selecting a model of DNA (nucleotide) evolution; it is designed to be easy to use by researchers who do some sequencing and may not have access to phylogenetic packages.
Approach: FINDMODEL can analyze 28 models or a restricted subset of 12 models. It creates a guide tree using Weighbor, optimizes branch lengths, calculates the likelihood for every chosen model (using baseml from the PAML package), and computes the Akaike information criterion (AIC). The model with the smallest AIC score is considered to be the best-fit model. Because of server limitations, the FINDMODEL web server processes inputs above a certain size in non-interactive mode, sending an email to the user when it has analyzed that user's data and providing a down-loadable file with the results.
Results: To test the performance of FINDMODEL, we generated simulated DNA sequences with Seq-Gen under four different models of nucleotide substitution of different complexity and compared the inferred model with the true model. We used 17 different configurations, with 5 instances for each set of parameter values. FINDMODEL returned the correct model for 75% instead of homogeneous rates for another 8%. Moreover, on all tests where FINDMODEL did not return the correct model, the normalized AIC error between the correct and the predicted models was below 0.002 (and the actual AIC difference was below 7).
Ph.D. Dissertation Defense: "Splintering Commodity Network Protocols in High-Performance Computing"
Date: April 13, 2005
Time: 1:30 p.m.
Location: FEC 141
Patricia Gilfeather pfeather@cs.unm.edu
Department of Computer Science, UNM
Systems research is about creating a performance environment for applications within the constraints of the underlying components. This dissertation is specifically about creating a performance environment for high-end applications within the constraints of commodity components (protocols, operating systems and hardware).
This research analyzes the bottlenecks of TCP/IP in today's clusters and finds that while the MPI implementations need to be improved, decreasing the overhead associated with TCP/IP will increase the performance of MPI. First, this dissertation outlines offloading IP fragmentation and reassembly onto a programmable NIC. This drastically reduces the overhead associated with the interrupt pressure of Ethernet frames and leads to a general method for offloading, splintering. Splintering means isolating a small functionality of a protocol, distributing this functionality, and constraining the implementation of the functionality.
The splintering method is applied to TCP by splintering the demultiplexing of the connection and offloading the demultiplexing onto the NIC. This dissertation describes splintering and its relation to other offload methods including OS-bypass and zero-copy. In order for Splintered TCP to be a success, it must be constrained by the resource limitations of a commodity NIC.
This paper outlines the Extensible Message-oriented Offload model created to model offload techniques with respect to CPU speed. The model illustrates offload hiding. Splintered TCP performs offload hiding by allowing the application the flexibility of choosing when protocol headers are processed. The model also illustrates protocol bypass. Latency is lowered because most of the protocol processing is performed after the application receives the data.
Finally, the dissertation outlines the creation of Connection-less TCP which drastically decreases the memory usage of TCP connections allowing them to be offloaded in a memory-constrained environment. Connection-less TCP allows connections and demultiplexing to be offloaded onto the NIC even when there are a large number of connections.
Master's Thesis Defense: A Framework for DRM as a Layered System
Date: Friday, April 8, 2005
Time: 1:00 p.m.
Location: ECE 118
Pramod A. Jamkhedkar <pramod54@cs.unm.edu>
Department of Computer Science, UNM
The current landscape for digital rights management (DRM) consists of various ad hoc technologies and platforms that largely focus on copy protection. The fragmented nature of the DRM industry in 2005 is somewhat reminiscent of the telecommunications industry in the late 1980's. At that time various networking technologies were available, and what was needed was a technology that could integrate existing networks and provide various services to users. The OSI layered framework and the TCP/IP communications protocol suite provided a solution to this situation. The OSI model divides the process of digital data communications into layers. Likewise, we divide the process of DRM into layers in which various services are offered to the users of digital content at each layer. Three blocks of layers have been identified. The upper layers deal with the end-to-end functions of the application, the middle layers deal with rights expression and interpretation, and the lower layers ensure rights enforcement. We describe how responsibilities will be distributed among the various layers, and consider where in these layers it would be appropriate to define protocols and standards. A layered DRM framework will help identify, separate and manage tussles in the DRM industry and also facilitate interoperability.
Master's Thesis Defense: Evaluating Augmented Bayesian Classifiers on Gene Expression Data
Date: Friday, April 8, 2005
Time: 1:00 p.m.
Location: FEC 141
Vikas Hamine <vikas@cs.unm.edu>
Department of Computer Science, UNM
We evaluate Bayesian network classifiers on classification of gene expression
data. Specifically, we evaluate the \emph{Tree-Augmented Naive Bayes} (TAN)
model using two different scoring metrics. The TAN model has been known to perform
better than the naive Bayes model in terms of classification on many small-sized
datasets. However, its
behavior on Gene Expression Data is not known to the author. We evaluate the
TAN model on two publicly available gene expression datasets. A polynomial time
algorithm for learning optimal TAN models using \emph{Minimum Description Length}
(MDL) as the scoring metric is known in the literature. We also present a polynomial
time algorithm for learning optimal TAN with respect to the \emph{Bayesian-Dirichlet}
(BD) metric. Other significant contributions include polynomial time algorithms
for learning optimal \emph{Forest-Augmented Naive Bayes} (FAN) - a generalization
of the TAN model, with respect to both the MDL and the BD metric. A comparison
of the TAN and FAN classifiers based on the number of correct classifications
is presented. We also
compare the performance of the MDL and BD metrics with respect to the TAN and
FAN classifiers. \cite{helman:bayesian} have presented a Bayesian network classifier
which we refer to as the Helman-Veroff model. The Helman-Veroff model is shown
to perform well on gene expression domains. One of our goals is also to compare
the TAN and FAN classifiers with the Helman-Veroff model on gene expression
datasets.
Master's Thesis Defense: Using Edge Weight Stability as a Tool in Computational Phylogenetic Reconstruction
Date: Thursday, March 31, 2005
Time: 1:00 p.m.
Location: FEC 141
Nicholas Pattengale <npcomplete@gmail.com>
Department of Computer Science, UNM
After a parsimony search there are often many most-parsimonious (or nearly so) trees. Commonly utilized metric distances to compare two phylogenetic trees (e.g. Robinson Foulds, Nearest Neighbor Interchange) are based solely on tree topology. Likewise, most consensus methods ignore edge weights. Since edge weights constitute parsimony score, we believe that summary methods (metric distances, consensus methods) should be invented so as to consider edge weight.
Robinson and Foulds extended their well known distance metric to accommodate weighted trees. We derive a generalized family of metrics (that includes weighted RF as a member) between weighted trees. The family arises from embedding the trees into a high-dimensional normed vector space.
Our embedding technique also enables the attribution of probability distributions to edges such that variance corresponds to edge weight stability. The stability concept gives rise to a new optimization based consensus method in which we include edges in the consensus tree based upon their stability.
Finally, there are axiomatic limitations on the fidelity with which a consensus tree can exhaustively summarize the input set. Thus we present a non-traditional summary algorithm that identifies (potentially) more than one representative tree for the set. The vector representation aids us yet again, as the computational complexity of the algorithm is improved by a randomized re-embedding of the input trees so as to reduce dimensionality.
Master's Thesis Defense: A Machine Learning Approach for Biological Information Extraction
Date: Wednesday, March 30, 2005
Location: FEC 145
Time: 10:00 a.m.
Sushmita Roy <sroy@cs.unm.edu>
Department of Computer Science, UNM
Biological literature is an extremely valuable source of information. However,
due to the absence of an efficient data mining technique, this rich information
source
remains fairly unused. The fact that this data mine is in unstructured natural
language
form makes the problem of information extraction hard but challenging. We apply
a machine learning approach to the problem of information extraction from
scientific abstracts. We analyze existing discriminative and generative probabilistic
frameworks to model the information source and find that temporal dependencies
and contextual information are extremely useful for robust and reliable results.
Based on existing approaches, we develop our own generative model, naive Bayes
hidden Markov model that performs at par with state of the art algorithms.
Finally, we develop a rule-based datamining system that predicts relationships
between genes based on the tagged output of the probabilistic models.
MS Thesis Defense: A New Tool For Option Valuation and Analysis
Date: March 24, 2005
Location: FEC 141
Time: 2:15 p.m.
Anshul Shekhon <anshul@cs.unm.edu>
Department of Computer Science, UNM
Existing applications for option valuation are limited in their ability to value more than a few basic types of options. There seems to be no generic tool that can be used to value non-standard types of options. This work presents a new tool to value and analyze standard and many types of non-standard options, including exotic path dependent options. The parameters used for the models in this tool are carefully selected and are proposed as heuristics. Another useful features of this tool is its ability to provide better visual representation of sensitivities of option value, especially for exotic options. This application can help reduce the errors in valuations and will also improve risk analysis and management.
Date: February 28, 2005
Location: FEC 141
Time: 3:00 p.m.
The second part of this thesis is an investigation of what is usually called
the minimum-bends travelling-salesman problem; given a finite set of points,
find a piecewise-linear tour through all these points, minimizing either the
number of corners or the total angle of turn. This problem has been much less
studied than most other elementary geometric-combinatorial questions, in spite
of its wide variety of potential applications in robotics, sensor placement,
and machining. Our approach here is less heuristic and more rigorous, with proofs
of upper and lower bounds for a variety of instances. We have obtained considerable
improvements over previous results in this area, and have found new evidence
for a long-standing conjecture.
Ph.D. Dissertation Defense: Automated Methods for Creating Diversity in Computer Systems
Date: February
18, 2005
Location: FEC 141
Time: 2:00 p.m.
Gabriela Barrantes <gbarrant@cs.unm.edu>
Department of Computer Science, UNM
The ease of attacking multiple identical systems once one machine is compromised, combined with the pervasive homogeneity of computer system attached to the Internet, represents a serious security threat. Biological diversity provides a defense against unpredictable threats by maximizing the probability that some individuals will survive and replenish the population with a defense against that particular threat. Using natural diversity as inspiration, diversity in computer systems could confer security benefits by protecting against attacks that are new, or that have been modified to avoid signature detection, but that rely on known system invariants.
A diversity defense can render a standard attack ineffective or slow it down, depending on its placement and implementation. Diminishing the uniformity in existing systems is, however, a non-trivial task, as standardization must be maintained at many interface points in any given system. This dissertation intends to assess the costs and benefits benefits of adding diversity to existing computer systems by implementing diversity at different levels. Diversification in computer systems can be done at the interface or the implementation level. Three techniques to introduce automated diversity in existing systems are presented: one at the interface, and two at the implementation levels, and their effectiveness at stopping or slowing down attacks is studied.
The first diversity scheme presented is an interface diversification: a machine language randomization, named Randomized Instruction Set Emulation (RISE). RISE is intended as a protection against the threat of code-injection attacks, which insert malicious machine-language code into programs using different weaknesses during input processing. Code-injection attacks constitute one of the most prevalent threats on current networks, and its most famous example are the so-called buffer overflow attacks. RISE protects against all code-injection attacks regardless of their point of entry by creating a unique machine code per process. The current RISE implementation runs over an emulator, and maps all executable bytes of the process to a random mask created for the purpose of diversification. When injected code attempts to execute, its code is also mapped to the mask, but given that it was not correctly encoded, it is "decoded" to random bytes, which attempt to execute on the real processor and most of the time end up in a fatal exception. Even though the attack will not execute as intended, there is a small probability that the attack will manage to execute some random instructions. This work also offers an analysis of the risks associated with the execution of random instructions.
Many Denial of Service (DoS) attacks exploit a system's implementation rather than its interface. Two approaches to diversify an implementation are explored. The first one randomizes internal protocol parameters within acceptable ranges. It is tested against an attack targeting one of the TCP congestion control parameters, where it achieves the objective of keeping a portion of the hosts in the attacked network operating at larger bandwidths than if they were all using standard parameter values.
The second implementation diversification attempts to go deeper and modify the implementation of a task, with the intention of providing different execution behaviors. The target threat are Denial of Service attacks by resource exhaustion. These attacks usually anonimize the requests making them hard targets for detection. The proposed diversity solution gives each host a different attack-legitimate classifier filter. Each filter uses a distribution of legitimate and attack requests, and blocks requests considered illegitimate based on this distribution. Under attack, those filters with distributions closer to the real one created by the combination of attack and legitimate traffic, will be less affected by the DoS. The filters are created by using Genetic Programming trained with different attack-legitimate patterns. Current results suggest that it could be effective when used as a front-end for resource managers that are non-preemptive in nature.
MS Thesis Defense: Modeling Pathways of Cell Differentiation in Genetic Regulatory Networks With Random Boolean Networks
Date: Wednesday, February 16, 2005
Location: FEC 141
Time: 2:00 p.m.
Sheldon Dealy <sdealy@cs.unm.edu>
Department of Computer Science, UNM
It has long been known that some cells differentiate into other cell types. Under the assumption that living cells can be modeled using non-linear dynamical systems, then cell types are attractors. If a cell type is an attractor, then a series of steps from one attractor to another is a pathway of differentiation. As non-linear dynamical systems, random Boolean networks were analyzed for their suitability in a first attempt to model pathways of differentiation in model genetic regulatory networks. We hypothesize that perturbations of Boolean networks share some key behaviors of genetic regulatory networks in the process of differentiation. Furthermore, several solutions to the problem of mapping the discrete time nature of Boolean networks to the continuous nature of genetic regulatory networks are presented. The findings suggest that Boolean networks are able to support predictions about the pathways of differentiation in living cells. The predictions should be testable using gene array techniques.
MS Thesis Defense: Parameterized Phase Screen Modeling for Very Large Array
Date: Thursday, February 10, 2005
Location: FEC 141
Time: 10:30 a.m.
Chunai Cai <ccai@cs.unm.edu>
Department of Computer Science, UNM
This thesis is about modeling the tropospheric phase error of radio astronomical
data obtained from radio astronomical instrument, VLA. VLA is a Very Large Array
of 27 antennas designed to collect radio signals from celestial sources. The
astronomical data we are interested is called Visibility. Basically scientists
can make images of celestial sources they are interested in from these Visibility
data, which has amplitude and phase. To get a good image, one has to have good
Visibility data. But Visibility data is produced from the output of electrical
signals from the antennas. And the inputs to antennas are the radio signals,
which travel a long way and its amplitude is attenuated and its phase distorted
along the way. Besides, antennas themselves are not perfect instruments. They
introduce errors during their processing of the radio signals. Algorithms and
tools are developed and made available to astronomers and scientist to restore
ideally the true face of the radio signals by cleaning the Visibility data.
Phase error correction is critical for good imaging especially for high frequency
observations. There are not many existing algorithms for phase error correction.
In AIPS++ a least square fitting routine is used solve phase errors for each
antenna. In the case of VLA, if all 27 antennas are used, the parameters to
be solved for the least square fitting is 27. According to Tim Cornwell's work
[TC81], assuming all visibility data has the same variance, then the variance
in visibility data sv2 and the variance in total gain sG2 has the following
relationship: where S is the flux density of a point calibration source where
N is the number of gains to be solved. It is clear that if we reduce the number
N of unknowns to be solved, we can also reduce the variance in visibility data,
assuming the gain variance keeps stable.
In this thesis we propose a physical model of antenna tropospheric phase error
by associating the phase error with the antenna's positions. First order, second
order Taylor expansions along the antenna x, y coordinates are used for least
square fitting to the tropospheric phased error obtained from existing phase
correction software package AIPS++ for VLA. In this modeling we reduced the
number N of unknowns to be solved to a maximum of 10. Visibility data from different
VLA array configurations are used to test the model. The result shows the proposed
model works well for the most compact array configuration, but less ideal for
extended configuration like BnA.
MS Thesis Defense: Generating hard satisfiable 3-SAT instances
Date: Wednesday, February 9, 2005
Location: FEC 141
Time: 2:00 p.m.
Haixia Jia <hjia@cs.unm.edu>
Department of Computer Science, UNM
The evaluation of incomplete satisfiability solvers depends critically on the
availability of hard satisfiable instances. A plausible source of such instances
consists of random k -SAT formulas whose clauses are chosen uniformly from among
all clauses satisfying some randomly chosen truth assignment A . Unfortunately,
instances generated in this manner tend to be relatively easy and can be solved
efficiently by practical heuristics. Roughly speaking, as the formula's density
increases, for a number of different algorithms, A acts as a stronger and stronger
attractor. First, we introduce a simple twist on this basic model, which appears
to dramatically increase its hardness. Namely, in addition to forbidding the
clauses violated by the hidden assignment A , we also forbid the clauses violated
by its complement, so that both A and complement of A are satisfying. It appears
that under this ``symmetrization'' the effects of the two attractors largely
cancel out, making it much harder for algorithms to find any truth assignment.
We give theoretical and experimental evidence supporting this assertion. We
then introduce a highly structured family of hard satisfiable 3-SAT formulas
corresponding to an ordered spin-glass model from statistical physics. This
model has provably ``glassy'' behavior; that is, it has many local optima with
large energy barriers between them, so that local search algorithms get stuck
and have difficulty finding the true ground state, i.e., the unique satisfying
assignment. We give experimental evidence supporting its hardness.
MS Thesis Defense: First-Order Stochastic Systems for Diagnosis and Prognosis
Date: Friday, December 10, 2004
Location: FEC 141
Time: 9:00 am - 11:00 am
Chayan Chakrabarti <cc@cs.unm.edu>
Department of Computer Science, UNM
First-order systems are an important tool for knowledge representation in Artificial Intelligence research. The representational power of first-order systems is very broad. The inference algorithms for first-order systems are simple and tractable. Hence, first-order systems provide a compact yet powerful tool for modeling complex environments.
Pless and Luger (2002) explored a new logic-based first-order stochastic modeling language named Loopy Logic. This language aims at solving problems with repetitive, or recursive, structures. It can represent product distribution effectively and can be translated to a well-known and effective inference algorithm: loopy belief propagation. Parameter learning is naturally supported in this framework.
This thesis explores the representational power of Loopy Logic. Three applications of Loopy Logic in the domain of diagnosis and prognosis are analyzed in detail. These applications are diagnosis of digital circuits, fault detection in mechanical systems and parameter learning in a life support simulation.
Date: Wednesday November 17, 2004
Location: FEC 141
Time: 3:00pm - 5:00 pm
Ke Ye <keye@cs.unm.edu>
EST clustering algorithms have many applications on biological data such as clustering genes that are from the same cells but from the different gene products, identifying new genes, and discovering alternative splicing etc. In these applications, clustering performance and quality are the most critical issues. Computer scientists have enhanced the computational efficiency by improving both the distance and clustering algorithms, but the quality of clustering is hard to assess, as we never know the correct answers for the real biological data. In this project, I utilized an EST generator tool (SeqGen) to generate artificial ESTs and made use of the simulated datasets to evaluate four EST clustering algorithms-SmithWaterman, ESTate, N2tool and Blastclust with regard to the clustering qualities
Date: Monday September 13, 2004
Time: 1pm-3pm
Location: FEC 141
Christy Warrender <christy@cs.unm.edu>
Department of Computer Science, UNM
Abstract:
Peripheral tissue microenvironments play an important role in determining
responses to infection, but these small-scale interactions are usually
not included in immune system models. An immune response to a
pathogen involves numerous cells of several different types, each with
a limited ability to sense its environment and to act. These cells
produce many kinds of signalling molecules called cytokines that
act locally to affect cell behavior. The dynamical interactions among
this heterogeneous collection of cells and molecules are extremely
complex. To begin to understand such a system, we need modeling
approaches that can relate population-level dynamics to individual
cell interactions and responses to cytokines. I will describe a spatially
explicit simulator that tracks individual cell responses to local molecular
signals, and the use of that simulator to study models of peripheral tissue
dynamics in normal and disease conditions.
Date: Wendesday August 11, 2004
Time: 1:00 pm
Location: FEC 141
Terry Van Belle <vanbelle@aruna.ca>
Computer Science Department, UNM
Drawing on models of the evolution of living systems, this dissertation
explores the principle of modularity, both biological and in software,
and its role in creating structures that are easy to change.
These ideas are captured in the Software Evolvability Change Optimization
(SECO) model, a framework for investigating how modularity can enhance
evolvability in software.
SECO abstracts software history by dividing the code into non-overlapping elements that are linked together by a series of changes. These changes are either gathered from the recorded histories of real software, or modeled using evolutionary computation, change propagation among elements, or correlations in changes between elements.
The dissertation uses SECO in both an analytic and synthetic role, investigating aspects of modularity such as encapsulation and code factoring, and using automatic techniques to optimize the modular structure of real code.
The dissertation contributes to the further understanding of modularity as a means of improving software evolvability by adding the dimension of time to the analysis. In this way, it can discover dependency links between software elements that are not evident from a static analysis of the program.
Date: Monday, March 8, 2004
Time: 10am-11am
Location: Farris Engineering Center 141
Jared Samuel Dreicer, <jared@cs.unm.edu>
Computer Science Department, UNM
Abstract:
The use of spare replacement hardware and checkpoint rollback software fault
tolerance on Multiple-Instruction-Multiple-Data (MIMD) architecture was investigated.
New performance results are presented for spare node replacement after simulated
failure and migration onto spare node prior to simulated failure. Spare replacement
and migration onto spare were implemented for application parameter characterization
runs on 32 nodes and scaling runs from 8 - 128 nodes on a MIMD cluster. The
CUMULVS system was used for fault tolerant and control features. We evaluated
the spare node replacement and migration onto spare node approaches using runtime
to quantify performance and demonstrate viability of the approaches.
The principle new results of this dissertation are that:
1) Spare node replacement provides good performance at a small cost in runtime;
2) Migration onto spare provides even better performance at a small cost in
runtime; and
3) A runtime breakeven point dependent on system scale is identified for both
approaches relative to traditional approaches.
Results were quantified for empirical studies on 8 - 128 nodes. These studies investigated applications characterized by various computation-communication ratios, work patterns (steady, accumulate, disperse, hill, and hole), and various topologies (ring, one-to-all, and near neighbor). The decrease in the cost of commodity hardware enables strategies that can efficiently use a spare as a general means of dynamic redundancy. The gain resulting from these approaches are due to decreased recovery time given immediate access to a spare, the Mean Time To Repair (MTTR) is reduced. Checkpoint and rollback overhead is still incurred, but for migration onto spare checkpoint overhead can be dramatically reduced. The scale of distributed memory MIMD architectures continue to grow due to user requests for greater performance, their increased computational requirements involving finer resolution, and enabled by the decreasing cost of commodity hardware. However, these larger architectures experience an increasing frequency of component failures and subsequent loss of availability. Fault tolerance and availability are therefore important issues for high performance computing systems executing long running applications. Our research indicates that utilizing spare replacement enhances scalability and availability of MIMD architectures and that further research will pay important dividends.
Date: Friday, January 30, 2004
Time: 11am-12:15pm
Location: Woodward 149
Ye Cong, <yecong@cs.unm.edu>
Department of Computer Science, UNM
Abstract:
Many modern graphics systems are built with commodity components such as personal
computers, graphics accelerators, and projection devices. This
architecture results in low price-to-performance ratio, yet a potential problem
arises: the graphics accelerators can deliver very high rendering speed, however
the graphic distribution relies on the existing network. This thesis describes
how the network affects the performance of parallel rendering on a PC cluster.
First I show that parallel rendering is more sensitive to network bandwidth
than network latency and overhead. Then, I introduce a specific cluster-based
graphics system and some experiments done on this system to reveal the effects
of the network on parallel rendering from different aspects. In order to show
that the software layers add considerable overhead, some performance comparisons
using MPICH, LAMMPI and Chromium are made.