Thesis and Dissertation Defenses
Ph.D. Dissertation Defense: Structural Inference and the Statistics of Networks
Date: Thursday, July 20th, 2006
Time: noon
Place: FEC 141
Aaron Clauset
The discovery that the topology of real-world networks differs dramatically from the idealized structures of graph theory has, in recent years, produced a flood of interest in understanding and characterizing the complexity, origins and utility of these differences. However, linking this knowledge with practical ends, such as designing new Internet communication protocols, manipulating metabolic networks, or predicting future behavior depends quite strongly upon the quality of the tools and methods we use to characterize the underlying network structure.
These techniques, frequently borrowed from graph theory, statistical physics and sociology, tend to be statistical in nature, primarily because the configuration space is extremely larger, even a network of moderately size. As such, network structure is often summarized as a distribution or an average of some interesting quantity. It is now clear, however, that these simple characterizations can fail to capture much of the governing structure.
For instance, much effort has recently been spent indirectly characterizing the structure of the router-level Internet via the paths taken by packets crossing the network. The first two chapters of this thesis explore, through analytics and numerical simulation, the fundamental bias inherent in this technique, showing that it can produce power-law degree distributions where none actually exist. We further show that the severity of the bias is related to the excess of the underlying network and remains until nearly all vertices in the network participate in the mapping effort.
Unlike the case of the router-level Internet, we often have fairly high-quality maps of the pair-wise topological structure of real-world networks. In this case, the concise summarization of the large-scale topological structure remains an important, yet difficult and time-consuming task. In the middle two chapters, we describe two fast heuristics for the inference of community structure in real-world networks, and demonstrate that they can summarize, in an automated fashion, the large-scale structure of very large networks, i.e., n ≈ 106 or larger.
In the final two chapters, we describe a very general model of network organization, based on the observation that networks often exhibit a hierarchical structure, in which the vertices may be recursively grouped into coherent clusters. We show that this model, called a hierarchical random graph (HRG), can be used to automatically annotate the structure of a graph, assigning weights to edges, assigning group-association strengths to vertices, and extracting the minimal set of hierarchical features that explain the structure of the original graph. We further show how HRG models can be used to predict missing network data with a high degree of accuracy, show that HRG models can be used to generate sophisticated null-models for testing network hypotheses, and can naturally explain the presence of non-trivial structure such as heavy-tailed degree distributions, high clustering coefficients and short paths between pairs of vertices, all hallmarks of complex networks.
MS Thesis Defense: Identifying Necessary Rights Expression Language Elements for Digital Protection of Classified Information
Date: Wednesday, July 12th, 2006
Time: 10 am
Place: EC&E 118
Ryan Custer
This research identifies how the extensible rights expression language (XrML) can be used to provide support for information critical to the protection of national security, also know as classified information. A list of requirements is developed that identify features that XrML must provide to effectively express how classified content should be managed. The XrML elements that can be used to satisfy many of the requirements are identified and an extension to XrML is developed that satisfies the remaining requirements. This provides a “roadmap” of how XrML can be used in a larger DRM environment that will provide enhanced security and support for the proper management of classified content on digital information systems.
MS Thesis Defense: Framework for Client Anonymity in Location Aware Wireless Services
Date: Friday, June 9th, 2006
Time: 9:30 am
Place: FEC 141
Sheela Vats
Location aware services can be used for advertising, guiding tourists, locate a store or restaurant, finding the wait time of a restaurant, or checking the flight schedule of an airline in an airport. The main problem associated with getting location specific service is that the provider gets the client's profile and other details and thus detrimental privacy. In this thesis, we present the design and implementation of an extensible architecture for privacy-preserving location-aware client-server interaction. Our solution is based on pseudonyms and SOAP technologies for client server interactions. In this solution, we create a middleware between the client and service providers, which works intelligently for its clients. We also describe a protocol for how the wireless client can interact with the middleware so that it can utilize the services of different providers.
MS Thesis Defense: Inference in Semi-Markov Dynamic Bayesian Networks
Date: Wednesday, May 24th, 2006
Time: 9am
Place: FEC 141
Anuj Gupta
Dynamic Bayesian networks (DBN) are used in a number of time series modeling applications including user modeling, financial modeling, and fault detection. DBNs are constrained by their geometric state duration distribution. Many time series, including those generated by user activities, do not have a geometric distribution. By incorporating semi-Markov property into a DBN, we can model any arbitrary state duration distribution. This thesis introduces an efficient inference algorithm for semi-Markov DBNs. The thesis proposes an extension to the junction tree inference algorithm, to perform simultaneous belief and constraint processing. It uses this hybrid inference for DBNs with multi-valued categorical variables, specifically DBNs where determinism is present due to the semi-Markov property. Inference in a semi-Markov DBN is reduced from O(Tnd) to O(Tnd-k), where d is the total number of variables in the largest clique, n is the cardinality of each variable, T is the total number of time slices, and k is the number of variables constrained to take on deterministic values. Experiments conducted on multiple semi-Markov DBNs verify the theoretical result.
Ph.D. Dissertation Defense: Druid: Representation of Interwoven Surfaces in 2 1/2 D Drawing
Date: Wednesday, May 10th, 2006
Time: 10am
Place: FEC 141
Keith Wiley
The state-of-the-art in computer drawing programs is based on a number of concepts that are over two decades old. One such concept is the use of layers for ordering the surfaces in a 2 1/2 D drawing from top to bottom. A 2 1/2 D drawing is a drawing that depicts surfaces in a fundamentally two-dimensional way, but also represents the relative depths of those surfaces in the third dimension. Unfortunately, the current approach based on layers unnecessarily imposes a partial ordering on the depths of the surfaces and prevents the user from creating a large class of potential drawings, e.g., of Celtic knots and interwoven surfaces.
The first half of this dissertation describes a novel approach which only requires local depth ordering of segments of the boundaries of surfaces in a drawing rather than a global depth relation between entire surfaces. Our program provides an intuitive user interface with a fast learning curve that allows a novice to create complex drawings of interwoven surfaces that would be extremely difficult and time-consuming to create with standard drawing programs.
The second half of this dissertation describes a previously unrealized topological property of 2 1/2 D scenes. Knowledge of this property makes possible the design of algorithms for manipulating 2 1/2 D representations in a way that is isomorphic to elemental 2 1/2 D scene changes. Furthermore, this property can be exploited to vastly improve the performance of a 2 1/2 D scene editor.
Master's Thesis Defense: Incremental Program Data Type Discovery in Binary Executables
Date: Monday, May 8th, 2006
Time: 3pm
Place: ME 427
Erik Lee
Reverse engineering of executables is a task which demands significant intuition and experience of its practitioners. Reconstruction of source code from compiled binary code is in general an undecidable problem. However, with knowledge of the algorithms and tools used to generate the executable, a significant amount of information can be obtained from its analysis. This thesis discusses several techniques and novel combinations of known techniques to approach the problem of reverse engineering unknown executables from a data structure perspective rather than from the much more prevalent code structure perspective. This research enhances the ability of automated tools to discover and use information about binary executables.
Master's Thesis Defense: Making Chord Robust to Byzantine Attack
Date: Friday, April 28, 2006
Time: 11am
Place: FEC 141
Maxwell Reeves Young
Chord is a distributed hash table (DHT) that requires only O(log n) links per node and performs searches with latency and message cost O(log n), where n is the number of peers in the network. Chord assumes all nodes behave according to protocol. We give a variant of Chord which is robust with high probability for any time period during which: 1) there are always at least z total peers in the network for some positive integer z; 2) there are never more than (1/4-\epsilon)z Byzantine peers in the network for a fixed epsilon>0; and 3) the number of correct peer insertions and deletions is no more than z^k for some tunable parameter k. We assume there is an computationally unbounded adversary controlling the Byzantine peers and that the IP-addresses of all the Byzantine peers and the locations where they join the network are carefully selected by this adversary. Our notion of robustness is rather strong in that we not only guarantee that searches can be performed but also that we can enforce any set of "proper behavior" such as contributing new material, etc. In comparison to Chord, the resources required by this new variant are only a polylogarithmic factor greater in communication, messaging, and linking costs.
Master's Thesis Defense: Phylogeny Reconstruction using Transpositions
Date: Friday, April 7th 2006
Time: 3:30pm
Place: FEC 145
Moulik Kothari
Using gene order data for phylogeny reconstruction gives information that ordinarily is not available with DNA data. However, using gene order data highly increases the complexity of phylogeny reconstruction programs and introduces questions much different from DNA data. While algorithms have been developed to solve some of the problems, many more need to be solved. Also, gene order data are not as commonly available as DNA data. Research has focused on using inversion (reversals) distances and breakpoint distances for reconstructing phylogenies. While inversions are commonly found in organisms, breakpoint distance is a mathematical abstraction of pairwise differences between two genomes. Transpositions have been proven to occur within orgranisms and more commonly found in fast evolving organisms like Mitochondria. However, no detailed study has been carried out using transpositions to reconstruct phylogenies. Our aim is to explore use of transposition data to reconstruct phylogenies. Two new distance algorithms that can work with transposition data are implemented. We evaluate these algorithms alongwith an inversion distance algorithm on genome data generated entirely using transpositions to reconstruct phylogenies. We compare the results on different measures to study the behaviour of each method under different circumstances.
Master's Thesis Defense: Self-healing networks
Date: Friday, April 7th 2006
Time: 2:30pm
Place: FEC 141
I-Ching Boman
Connectivity, the existence of a path between every pair of nodes, is an important aspect of many networks. Some examples where connectivity is paramount are: communication networks like P2P systems and wireless sensor networks, utilities networks like the power grid and the water supply, and infrastructures networks like roads.
Deletion of a node in a network can occur as a result of failures, sabotage, or planned shutdowns as part of energy conservation. As a result of node deletion, the network may become disconnected. One approach for maintaining connectivity is to structure the network in anticipation of these possible disruptions. Our approach is to fix it as we go. We do this by adding new edges to the network to maintain connectivity whenever they are needed.
In our model, the network is an undirected graph, G(V,E), consisting of a set of nodes and m edges. Assume G is connected initially. At each time step, an adversary can remove nodes from the graph. We will describe two versions of an algorithm, which we dubbed the "line" algorithm. The "line" algorithm adds edges to maintain connectivity. In addition, it is locality-aware, i.e. new edges are added only to those vertices that were not far apart before the deletion.
Master's Thesis Defense: Design and Implementation of a Baseline Port of the Jikes Research Virtual Machine to the Intel Itanium Architecture
Date: Friday, April 7th 2006
Time: 9:15am
Place: FEC 141
Ravi kiran Gorrepati
Of all 64-bit architectures, IA-64 is the newest, and it offers exciting new features for compiler developers. Its features place it in the higher end of the RISC architecture spectrum. Since this architecture is very different from other extant architectures, it offers a lot of room for research. IA-64 gains speed from efficient compilation, which for Java is costly for the compilation of modules happens dynamically on the virtual machine. Lack of an open source Java virtual machine for the IA-64 which allows for research on code generation techniques lead us to pursue this work. This work presents a simple code generator for the IA-64 architecture. This code generator is a part of the Jikes Research Virtual Machine for Java written in Java. To our knowledge, this work presents the first open source, just in time virtual machine for Java, on the IA-64 architecture.
Master's Thesis Defense: Computational Peridynamic Modeling of Concrete Structures
Date: Wednesday, April 5th 2006
Time: 11 am
Place: FEC 141
Santosh Bangera
The peridynamic model [1], introduced by Stuart Silling at Sandia National Laboratories, is a promising model from a conceptual as well as a computational point of view for simulating the behavior of plain and reinforced concrete structures [2]. The primary objective of this thesis is to predict and model the failure and deformation of two-dimensional linear elastic structures using a quasi-static peridynamic modeling method. In order to achieve this, a new computational framework is developed and implemented using Java for solving plane stress and plane strain structural analysis problems. A new modeling approach is implemented, which involves a topological method for independently mapping all domains in a problem, thereby enabling the application of peridynamic theory to a high fidelity structural model. This helps in improving the computational efficiency as well as in overcoming the shortcomings of earlier attempts. A system is implemented that allows the user to differentiate various boundary conditions, track development of cracks, and analyze the deformed structure behavior. Also implemented is a mechanism to give feedback on the computed displacement and internal force components acting on every subnode of a material structure.
[1] - Silling S. A., “Reformation of elasticity theory for discontinuous and long range forces”, Tech. Report SAND98-2176, Sandia National Laboratories, Albuquerque, NM, 1998.
[2] - Walter Gerstle, Sau N., and Silling S. A., "Prediynamic modeling of plain and reinforced concrete structure", 18th International conference on Structural Mechanics in Reactor Technology (Beijing, China), August 7-12 2005, 949-956.
Master's Thesis Defense: Computational Approaches to Determine Structure Activity Relationship (SAR) of Small Peptides and Non-Peptide Organics
Date: Tuesday, April 4th 2006
Time: 8 am
Place: FEC 141
Sreejith Vijayan
Currently, drug design is a time consuming and expensive endeavor; the development of a single new drug requires about $800 million and between 10 and 20 years. It is estimated that for every 10,000 compounds, only 1 is successfully tested in clinical trials. The binding of Leukocyte Functional Antigen-1(LFA-1) to Intercellular Adhesion Molecule (ICAM-1) is important in a variety of diseases and blocking their interaction may have therapeutic impact. Thus, we wanted to employ computational technologies to the problems of developing antagonists of (LFA-1)/ (ICAM-1) binding.
The goal is to determine the SAR (Structure Activity Relationship) of 12 peptides (4 each with high affinity, intermediate and low affinity) using a combination of experimental and computational techniques. The results generated from these models will be used to predict non-peptide organic structures that will be tested experimentally. Comparison of the computational predictions and experimental data will assist in the development of novel computational technologies in the future directed at drug discovery.
I tested the hypothesis that computer-based technology can more rapidly and accurately predict the structural relationship between peptides and ICAM/LFA-1 that can be used in drug discovery. These sorts of studies are not performed extensively since large sets of chemically-related compounds are not available. I took advantage of a 31 member set of compounds, 12 of which had their 3-dimensional structure solved chemically. This set of chemical set is unique and takes many man-years to develop.
A New Algorithm for Reducing Delivery Time of Radiation Therapy
Date: Friday, December 9th 2005
Time: 11am
Place: FEC 245
Jake Proctor
Intensity Modulated Radiation Therapy (IMRT) is a modern cancer treatment technique. IMRT is mainly implemented using a clinical linear accelerator (LINAC) equipped with a multileaf collimator (MLC). The LINAC is used to generate the radiation beams for the treatment. The MLC, which is composed of multiple pairs of configurable metal leaves that form an aperture, is used to control patient exposure to the radiation. An IMRT treatment plan is a set of MLC formed apertures. Because the total delivery time of a treatment plan is largely determined by the total number of MLC apertures in the plan, the key to IMRT delivery planning is to convert prescribed continuous or discrete radiation intensity distributions into a minimal number of MLC apertures.
In this thesis we develop a new algorithm to convert continuous intensity distributions into a minimal number of MLC apertures. Compared to the Pinnacle treatment planning system by Philips Corp., our new algorithm can reduce the number of apertures by as much as 75%.
Master's Thesis Defense: Infiniband Scalability in Open MPI
Date: November 7th 2005
Time: 1:30 - 3:30
Place: FEC 141
Galen M. Shipman
gshipman@cs.unm.edu
Infiniband is becoming an important interconnect technology in high performance computing. Recent efforts in large scale Infiniband deployments are raising scalability questions in the HPC community. This work in Open MPI provides several mechanisms to enhance Infiniband scalability. Initial comparisons with MVAPICH, the most widely used Infiniband MPI implementation, show similar performance but with much better scalability characteristics. Specifically: small message latency is improved by up to 10% in medium and large jobs; memory usage per host is reduced by as much as 300%; latency is close to optimal IB send/receive latency and bandwidth performance is similar to that of MVAPICH.
Ph.D. Dissertation: Negative Representations of Information
Date: November 4th 2005
Time: 9:30 AM
Place: FEC 141
Fernando Esponda
fesponda@yahoo.com
In a negative representation, a set of elements (the positive representation) is depicted by its complement set (the negative representation). That is, the elements in the positive representation are not explicitly stored, and those in the negative representation are. The concept, feasibility, and properties of negative representations are explored in this dissertation. Two approaches are taken: First, one in which the negative representation is an inexact depiction of the positive set i.e. not all possible items are characterized, with applications to machine learning and anomaly detection. Second, one where the negative representation depicts exactly all the items not in the positive set. It is shown that a positive set consisting of n, l-bit strings can be represented negatively using only O(ln) strings, through the use of an additional symbol. It is also shown that membership queries for the positive representation can be processed against the negative representation in time no worse than linear in its size, while reconstructing the original positive set from its negative representation is an NP-hard problem. Some additional properties, such as the differences in inferring answers from either set, are discussed in depth, and applications that address some of the privacy concerns of the day are outlined.
Master's Thesis Defense: Graphics Processing Unit Computation of Neural Networks
Date: October 18, 2005
Time: 9:00 a.m.
Location: High Performance Computing.
Christopher Davis
chris2d@cs.unm.edu
This paper outlines, discusses, and presents several artificial
neural network architectures that are amenable to
Graphical Processing Unit execution. Traits are identified that lead to
good or poor performance of the artificial neural network simulation.
Ph.D. Dissertation: Anomaly Detection in Dynamic Execution Environments
Date: July 22
Time: 1:00 p.m.
Location: FEC 141.
Hajime Inoue
hinoue@cs.unm.edu
In the past few years languages which run on virtual machines, like Java and C#, have become popular. These are platforms as well a languages, and they are characterized by being verifiable and garbage collected, and include Just-In-Time compilers, large standard libraries, and runtime profilers. I call platforms with these features dynamic execution environments (DEEs).
The runtime infrastructure of DEEs allows access to features of execution that were previously difficult to observe. My research consists of a series of case studies in which I build systems to classify behavior of a particular feature into normal and abnormal and then use that classification for either security or optimization purposes. These systems are anomaly detectors.
I build anomaly detection systems for method invocations, method invocation sequences, and Permissions. Used to detect intrusions or system faults, I call them dynamic sandboxes. I also show that an anomaly detector can be used to predict object lifetimes resulting in an improved garbage collector.
Ph.D. Dissertation Defense: Applications of the Probabilistic Method
Date: July 8
Time: 11:00 am.
Location: FEC 145
Vishal Sanwalani
<vishal@cs.unm.edu>
In this talk we discuss two problems. The first problem concerns the probability
a sparse random graph or hypergraph is connected. While it is exponentially
unlikely that a sparse random graph or hypergraph is connected, with probability
1-o(1) such a graph has a giant component that, given its numbers of edges and
vertices, is an uniformly distributed connected graph. This simple observation
allows us to estimate the number of connected graphs, and more generally the
number of connected d-uniform hypergraphs, on n vertices with o(n ln n) edges.
The second problem concerns electing a leader from n candidates (players). Specifically, in the leader election problem, there are n players, of which some fraction are faulty. We assume that an omniscient and computationally unbounded adversary picks which players will be faulty before the game begins and that this adversary controls the actions of all faulty players. We present a leader election protocol which is scalable in the sense that it requires each player to send and process a number of messages which is polylogarithmic in n and requires a number of rounds which is polylogarithmic in n. Provided that the fraction of faulty players is less than 1/3, our protocol elects a non-faulty leader with constant probability and ensures that a 1-o(1) fraction of the players know this leader.
Master's Thesis Defense:Some Results on Networks: Phase Transitions, Scale Invariance.
Date: July 5
Time: 11:00 am.
Location: FEC 141
Vamsi K Kalapala
vamsi@spruance.cs.unm.edu>
We present two results on networks, first, we present a result on a phase
transition in the 3-bit Exact Cover Problem (EC3), which can be viewed as a
restricted form of 2-coloring of random 3-regular hypergraphs. Second, we present
our results on scale invariance in road networks.
We study 3-bit Exact Cover Problem (EC3), also known as Positive 1-in-3 SAT.
Random instances of this problem have been used as a benchmark for simulations
of an adiabatic quantum algorithm. Empirical results suggest that EC3 has a
transition from satisfiability to unsatisfiability when the number of clauses
per variable $r$ exceeds some threshold $r^*$. This problem is unusual
in that numerical experiments can determine its threshold to a greater accuracy
than for other variants of 3-SAT; by performing exhaustive search on problems
of up to 1700 variables we estimate that $r^* \approx 0.62 \pm 0.01$. Recently
it was shown that $r^* \le 0.644$. Using the method of differential equations,
we place a lower bound on $r^*$'s location by showing when $ r < 0.546$ w.h.p.
a random instance of EC3 is satisfiable.
We study road networks as complex networks with an underlying geography. We
find that these networks have topological and geographical scale invariant properties.
We employ both primal and dual models of the network. In the primal model, intersections
form the nodes and road segments form the edges. In the dual model, roads form
the nodes and road intersections form the edges. In the primal model, we find
that journeys of widely varying lengths had a similar structure. In the dual
model, the degree distribution of the network is a power law indicating a scale
free topology. We give a simple fractal model that produces these properties.
Master's Thesis Defense: Design and Implementation of Scalable Synthetic Compact Application (SSCA) #2 Benchmark using Unified Parallel C (UPC)
Date: July 5th
Time: 9:00 a.m.
Location: FEC 141
Meenakshi Balasubramanian
<meenkoot@cs.unm.edu>
Scalable Synthetic Compact Applications (SSCA) #2 is a graph theory benchmark developed under DARPA’s High Productivity Computing Systems (HPCS) project. SSCA #2 is a part of the growing set of SSCA benchmark suite which currently includes SSCA #1-A Bio Informatics benchmark, SSCA #2-A Graph theory benchmark and SSCA #3-A Sensor Processing and Knowledge Formation benchmark. The intent of SSCA #2 is to develop a compact application having multiple kernels and accessing a single data structure that represents a directed multigraph with labeled edges. This synthetic benchmark consists of four kernels requiring irregular access to the graph’s data structure. We have developed a parallel implementation of this benchmark using Unified Parallel C (UPC), a parallel version of ANSI C having a distributed shared memory programming model layout. In this thesis we will discuss the design and implementation of each of the four kernels and also present the benchmark’s execution time and the validation results.
Master's Thesis Defense: A Per-Request Performance Monitoring
Framework for the Linux Kernel
Date: July 1, 2005
Time: 10:00 a.m.
Location: FEC 141
Sushant Sharma <sushant@cs.unm.edu>
Department of Computer Science
Linux currently plays an important role in high-end computing systems, but
recent work has shown that Linux-related processing costs and variability
in network processing times can limit the scalability of HPC applications.
Measuring and understanding these overheads is thus key for future use of
Linux in large scale HPC systems. Unfortunately, currently available performance
monitoring systems introduce large overheads, performance data is generally
not available on-line or from the operating system, and the data collected
by such systems is generally coarse-grained. The work done here presents a
low-overhead framework for solving one of these problems: making useful operating
system performance data available to applications at runtime. Specifically,
Linux Trace Toolkit(LTT) is enhanced to monitor the performance characteristics
of individual system calls and to make per-request performance data available
to applications. The presented framework has the ability to monitor individual
network and disk requests, and it has been demonstrated that the overhead
of the per-request performance monitoring framework is minimal. We have demonstrated
this by providing various benchmark results, which clearly shows that the
overhead of the framework varies from 2-9%.
Master's Thesis Defense: Games, Strategies, and Boolean Formula Manipulation
Date: June 30, 2005
Time: TBD
Location: FEC 141
Ben Andrews <bandrews@cs.unm.edu>
Department of Computer Science
This work describes the formalization of the process of converting strategies for two-player games of perfect information into Boolea formulas. This work also describes the development of some special purpose algorithms for manipulating Boolean formulas. While many of the concepts in this thesis are explained with specific implementations in mind, the ideas and formalizations are implementation-independent. In particular, this document describes the application of many of the concepts to real-world problems arising from ongoing molecular computation research.
Master's Thesis Defense: RIOT: A Responsive System for Mitigating Computer Network Epidemics and Attacks
Date: June 27, 2005
Time: 10:00 a.m.
Location: FEC 141
Justin Balthrop <juddhuck@gmail.com>
Department of Computer Science, UNM
Current strategies for the prevention of computer network epidemics and attacks fall into two general categories and have two main shortcomings. The two most popular commercial security measures are firewalls and antivirus software. These systems can be effective at stopping known viruses and attacks, but they are not adaptive and they require human intervention to setup firewall rules or create virus signatures. Thus, they are unable to respond to new methods of attack and infections from viruses that have never been seen before. A promising solution to this problem has been recent work on systems that limit the rate at which computers can make new network connections, known as rate limiters. However, prior work in this area has largely focused on preventing specific viruses or worms, and it has not been applied more generally to all attacks that involve high connection rates.
In this work, I address a wider variety of attacks that are characterized by high connection rates. A worm generally involves one machine attempting to connect to many different machines, but many high rate attacks have a different connectivity pattern from worms. A brute force password cracker, for example, involves many connections between just two machines, while a distributed denial-of-service attack involves many machines connecting to a single machine at the same time. The effectiveness of the majority of these attacks would be greatly reduced if their connection rates were slowed, but current rate limiting methods are not sufficiently general to do this.
In this thesis, I explain why a general system for limiting connection rates is our best chance for preventing widespread network epidemics and attacks, and then I describe the design of such a system. This system is called RIOT, and it limits the rate of all network interactions on the protected computer, allowing it to counter a wide range of attacks, both incoming and outgoing. The system groups different connections using detectors, learns the normal connection rates for each group of connections, and automatically drops packets if the rates exceed the learned level. This proves to be effective at dropping packets associated with attacks while not interfering with normal usage. I have implemented this system on both Linux and Mac OS X and provide the details of each implementation. I also report experimental results and users' experiences from running the system on multiple workstations.
Master's Thesis Defense: Reduced Latency Searches for Unstructured Peer-to-Peer Networks
Date: June 17, 2005
Time: 10:00 a.m.
Location: FEC 145
John Alphonse <johnalp@cs.unm.edu>
Department of Computer Science, UNM
Over the last few years, peer-to-peer networks have been developing explosively. A peer-to-peer network is a network in which each node plays the role of both server and client. The primary service in most peer-to-peer networks is storing and looking up data items. Optimizing search operations has been one of the focal points of research in this area. In this thesis, we present algorithms for storing and looking up data in unstructured peer-to-peer networks. Our algorithms are efficient in terms of both latency and bandwidth. Our algorithm for storing a data item establishes references to the data item in a subset of sqrt(nlnn) peers selected uniformly at random from all the peers in the network. Our algorithm for searching contacts an arbitrary subset of O(sqrt(nlnn)) peers. We show both analytically and empirically that with high probability even very rare data items can be found with our algorithms. Our algorithm for storing the data item, makes use of a new technique for choosing several nodes uniformly at random from an unstructured network.
Master's Thesis Defense: Approximate Bottom Line DEE: Hybrid Bottom Line Dee and Split Dee Committees
Date: June 10, 2005
Time: 10:00 a.m.
Location: FEC 145
Sung-hee Lee <slee1005@unm.edu>
Department of Computer Science, UNM
Side-chain structure prediction is an essential component of protein modeling and is an important subtask in the protein-folding problem. Dead End Elimination (DEE) is a powerful method to determine the global minimum energy conformation (GMEC) of a set of side chains, and to remove a subset of the possible combinations called rotamer.
Rotamers are set of side-chains defined by discrete Chi angles \cite{Ghiglia:1991}. The DEE process reduces the problem of size and computation time of overall side-chain prediction by reducing the combinatorial size of the problem \cite{Desmet:1992}.The algorithm has been studied since the early 1990's, and several different versions of DEE have been developed.
The most recent method is the Split DEE method by Desmert and Mayo \cite{Pierce:2002}. It divides the search area into partitions and applies Goldsteins DEE to each partition.
Bottom line DEE is theoretically the most powerful method but its computational time cost is high, prohibiting effective implementation \cite{Pierce:2002}. DEE methods determine the global energy minimum by defining those rotamers that are of consistent lowest-interaction energy with the side-chains. Then it is making a comparison to another rotamer using a lowest possible pairwise interaction energy calculation.
The main idea behind Approximate Bottom Line DEE is to scan all possible testing positions, and to retain only those rotamers with the lowest pairwise interaction potential energy. At the end of a single iteration, \emph{a set} of side chains is removed except a rotamer that contains lowest potential energy.
Approximate Bottom Line DEE branch from Bottom Line DEE and Split DEE. Experiment ran on both simulated data and real protein data. Result from using simulate data shows the speed improvement and result from using real protein data proves the accuracy of Approximate Bottom Line DEE.
Approximate Bottom Line DEE performs better optimization than other established methods including Original DEE and Goldstein DEE. The cost of Bottom Line DEE is theoretically larger than $O(n^p)$ \cite{Pierce:2000}, with $n$ representing the number of residue and $p$ the average number of rotamers on residues. The cost of Approximate Bottom Line DEE is $O(n^p)$, at worst case. My experiments represented in this thesis, I have sought to show that Approximate Bottom Line DEE is a viable approximate to that method, one that still runs efficiently.
Master's Thesis Defense: Distplot: Web-based Graphical Tool for Sequence Distance Analysis
Date: June 3, 2005
Time: 2:30 p.m.
Location: FEC 141
Ashish Agarwal <ashisha@cs.unm.edu>
Department of Computer Science, UNM
Many variable viruses are extensively studied because they are responsible
for causing major epidemics. These studies generate extremely complex and richly
interconnected data. Visualization of such biological data is an important
aspect of the research performed by biologists. Efficient and accurate visualization
tools can help biologists identify patterns and
relationships in this data.
Distplot was developed to study the Hepatitis C Virus (HCV), but can be used to analyze sequence data from any organism. It can help researchers study the evolution of gene sequences by visualizing data. None of the existing graphing tools provide the flexibility and ease of conducting an analysis similar to that provided by Distplot.
Distplot collects gene sequence data and information interactively in multiple steps through a web-browser. The input gene sequence data can be be grouped on several levels, such as patient number, sample number, tissue type, genotype, subtype, sample year, etc
The tool can compute pairwise distances between all sequences within each group,
or between all sequences of pairs of groups, and generate a graph representing
these distances. Users can also nest on one of the levels and indicate any level
as the numeric parameter. The tool can plot the arithmetic mean of distances
for each group or group-pair.
Also, users can customize the visual appearance of the graph by modifying the
page dimensions, symbol size, symbol color, symbol type, etc.
Distplot eliminates the need to transform data into multiple forms, providing ease of experimentation. It is very flexible and provides a high degree of customization. The results of the analysis are generated in pdf, postscript, png and plain text format.
Distplot aims to be a resource for scientists working on genetics, evolution, variability, and vaccine and drug design.
Master's Thesis Defense: Machine Learning Approaches for SIRNA Efficacy Prediction
Date: Monday, May 23, 2005
Time: 2:00 p.m.
Location: FEC 141
Sahar Abubucker <sahar@cs.unm.edu>
Department of Computer Science, UNM
RNA interference (RNAi) studies are being widely used to study gene expression
and gene regulation via selective gene knockdown experiments, which are important
for functional genomic studies. RNAi refers to the biological process by which
short interfering RNA (siRNA) degrades complementary messenger RNA (mRNA) sequences.
This knockdown of mRNA prevents it from producing amino acid sequences that
are responsible for gene expression. Recent studies indicate that all siRNAs
do not produce the same knockdown effects. Due to the high cost of synthesizing
siRNAs, rational siRNA design---a priori prediction of functionality for specific
siRNAs---is a critical task for practical RNAi studies. Therefore, a key component
of RNAi applications is the selection of effective siRNA sequences---ones that
are highly functional in degrading more than 80\% of the targeted mRNA. The
goal of siRNA efficacy prediction is to aid indesigning siRNA sequences that
are highly efficient in degrading target mRNA sequences.
Most of the current algorithms use positional features, thermodynamic properties
and secondary structure predictions to predict the knockdown efficacy of siRNAs.
In this work, frequently occurring patterns in siRNAs are used to make predictions
about their efficacy. Time after transfection is shown to be an important factor
in siRNA efficacy prediction---a feature that has been ignored in previous efficacy
studies. The relevance of these
extracted patterns to previous results and their biological significance are
analyzed. The ability of random feature sets to predict efficacy are studied
and the results are discussed. Our algorithm consistently performs better than
other publicly available efficacy prediction algorithms and does not require
any specialized hardware.
Master's Thesis Defense: "FINDMODEL: A Tool to Select the Best-Fit Model of Nucleotide Substitution"
Date: Friday, May 20, 2005
Time: 10:30 a.m.
Location: FEC 141
Ning Tao <ntao@unm.edu>
Department of Computer Science, UNM
Motivation: Choosing a model of sequence evolution is a crucial step when using DNA sequence data to reconstruct phylogenies: using a mismatched model will reduce accuracy and may lead to erroneous conclusions. FINDMODEL is a web-based tool for selecting a model of DNA (nucleotide) evolution; it is designed to be easy to use by researchers who do some sequencing and may not have access to phylogenetic packages.
Approach: FINDMODEL can analyze 28 models or a restricted subset of 12 models. It creates a guide tree using Weighbor, optimizes branch lengths, calculates the likelihood for every chosen model (using baseml from the PAML package), and computes the Akaike information criterion (AIC). The model with the smallest AIC score is considered to be the best-fit model. Because of server limitations, the FINDMODEL web server processes inputs above a certain size in non-interactive mode, sending an email to the user when it has analyzed that user's data and providing a down-loadable file with the results.
Results: To test the performance of FINDMODEL, we generated simulated DNA sequences with Seq-Gen under four different models of nucleotide substitution of different complexity and compared the inferred model with the true model. We used 17 different configurations, with 5 instances for each set of parameter values. FINDMODEL returned the correct model for 75% instead of homogeneous rates for another 8%. Moreover, on all tests where FINDMODEL did not return the correct model, the normalized AIC error between the correct and the predicted models was below 0.002 (and the actual AIC difference was below 7).
Ph.D. Dissertation Defense: "Splintering Commodity Network Protocols in High-Performance Computing"
Date: April 13, 2005
Time: 1:30 p.m.
Location: FEC 141
Patricia Gilfeather pfeather@cs.unm.edu
Department of Computer Science, UNM
Systems research is about creating a performance environment for applications within the constraints of the underlying components. This dissertation is specifically about creating a performance environment for high-end applications within the constraints of commodity components (protocols, operating systems and hardware).
This research analyzes the bottlenecks of TCP/IP in today's clusters and finds that while the MPI implementations need to be improved, decreasing the overhead associated with TCP/IP will increase the performance of MPI. First, this dissertation outlines offloading IP fragmentation and reassembly onto a programmable NIC. This drastically reduces the overhead associated with the interrupt pressure of Ethernet frames and leads to a general method for offloading, splintering. Splintering means isolating a small functionality of a protocol, distributing this functionality, and constraining the implementation of the functionality.
The splintering method is applied to TCP by splintering the demultiplexing of the connection and offloading the demultiplexing onto the NIC. This dissertation describes splintering and its relation to other offload methods including OS-bypass and zero-copy. In order for Splintered TCP to be a success, it must be constrained by the resource limitations of a commodity NIC.
This paper outlines the Extensible Message-oriented Offload model created to model offload techniques with respect to CPU speed. The model illustrates offload hiding. Splintered TCP performs offload hiding by allowing the application the flexibility of choosing when protocol headers are processed. The model also illustrates protocol bypass. Latency is lowered because most of the protocol processing is performed after the application receives the data.
Finally, the dissertation outlines the creation of Connection-less TCP which drastically decreases the memory usage of TCP connections allowing them to be offloaded in a memory-constrained environment. Connection-less TCP allows connections and demultiplexing to be offloaded onto the NIC even when there are a large number of connections.
Master's Thesis Defense: A Framework for DRM as a Layered System
Date: Friday, April 8, 2005
Time: 1:00 p.m.
Location: ECE 118
Pramod A. Jamkhedkar <pramod54@cs.unm.edu>
Department of Computer Science, UNM
The current landscape for digital rights management (DRM) consists of various ad hoc technologies and platforms that largely focus on copy protection. The fragmented nature of the DRM industry in 2005 is somewhat reminiscent of the telecommunications industry in the late 1980's. At that time various networking technologies were available, and what was needed was a technology that could integrate existing networks and provide various services to users. The OSI layered framework and the TCP/IP communications protocol suite provided a solution to this situation. The OSI model divides the process of digital data communications into layers. Likewise, we divide the process of DRM into layers in which various services are offered to the users of digital content at each layer. Three blocks of layers have been identified. The upper layers deal with the end-to-end functions of the application, the middle layers deal with rights expression and interpretation, and the lower layers ensure rights enforcement. We describe how responsibilities will be distributed among the various layers, and consider where in these layers it would be appropriate to define protocols and standards. A layered DRM framework will help identify, separate and manage tussles in the DRM industry and also facilitate interoperability.
Master's Thesis Defense: Evaluating Augmented Bayesian Classifiers on Gene Expression Data
Date: Friday, April 8, 2005
Time: 1:00 p.m.
Location: FEC 141
Vikas Hamine <vikas@cs.unm.edu>
Department of Computer Science, UNM
We evaluate Bayesian network classifiers on classification of gene expression
data. Specifically, we evaluate the \emph{Tree-Augmented Naive Bayes} (TAN)
model using two different scoring metrics. The TAN model has been known to perform
better than the naive Bayes model in terms of classification on many small-sized
datasets. However, its
behavior on Gene Expression Data is not known to the author. We evaluate the
TAN model on two publicly available gene expression datasets. A polynomial time
algorithm for learning optimal TAN models using \emph{Minimum Description Length}
(MDL) as the scoring metric is known in the literature. We also present a polynomial
time algorithm for learning optimal TAN with respect to the \emph{Bayesian-Dirichlet}
(BD) metric. Other significant contributions include polynomial time algorithms
for learning optimal \emph{Forest-Augmented Naive Bayes} (FAN) - a generalization
of the TAN model, with respect to both the MDL and the BD metric. A comparison
of the TAN and FAN classifiers based on the number of correct classifications
is presented. We also
compare the performance of the MDL and BD metrics with respect to the TAN and
FAN classifiers. \cite{helman:bayesian} have presented a Bayesian network classifier
which we refer to as the Helman-Veroff model. The Helman-Veroff model is shown
to perform well on gene expression domains. One of our goals is also to compare
the TAN and FAN classifiers with the Helman-Veroff model on gene expression
datasets.
Master's Thesis Defense: Using Edge Weight Stability as a Tool in Computational Phylogenetic Reconstruction
Date: Thursday, March 31, 2005
Time: 1:00 p.m.
Location: FEC 141
Nicholas Pattengale <npcomplete@gmail.com>
Department of Computer Science, UNM
After a parsimony search there are often many most-parsimonious (or nearly so) trees. Commonly utilized metric distances to compare two phylogenetic trees (e.g. Robinson Foulds, Nearest Neighbor Interchange) are based solely on tree topology. Likewise, most consensus methods ignore edge weights. Since edge weights constitute parsimony score, we believe that summary methods (metric distances, consensus methods) should be invented so as to consider edge weight.
Robinson and Foulds extended their well known distance metric to accommodate weighted trees. We derive a generalized family of metrics (that includes weighted RF as a member) between weighted trees. The family arises from embedding the trees into a high-dimensional normed vector space.
Our embedding technique also enables the attribution of probability distributions to edges such that variance corresponds to edge weight stability. The stability concept gives rise to a new optimization based consensus method in which we include edges in the consensus tree based upon their stability.
Finally, there are axiomatic limitations on the fidelity with which a consensus tree can exhaustively summarize the input set. Thus we present a non-traditional summary algorithm that identifies (potentially) more than one representative tree for the set. The vector representation aids us yet again, as the computational complexity of the algorithm is improved by a randomized re-embedding of the input trees so as to reduce dimensionality.
Master's Thesis Defense: A Machine Learning Approach for Biological Information Extraction
Date: Wednesday, March 30, 2005
Location: FEC 145
Time: 10:00 a.m.
Sushmita Roy <sroy@cs.unm.edu>
Department of Computer Science, UNM
Biological literature is an extremely valuable source of information. However,
due to the absence of an efficient data mining technique, this rich information
source
remains fairly unused. The fact that this data mine is in unstructured natural
language
form makes the problem of information extraction hard but challenging. We apply
a machine learning approach to the problem of information extraction from
scientific abstracts. We analyze existing discriminative and generative probabilistic
frameworks to model the information source and find that temporal dependencies
and contextual information are extremely useful for robust and reliable results.
Based on existing approaches, we develop our own generative model, naive Bayes
hidden Markov model that performs at par with state of the art algorithms.
Finally, we develop a rule-based datamining system that predicts relationships
between genes based on the tagged output of the probabilistic models.
MS Thesis Defense: A New Tool For Option Valuation and Analysis
Date: March 24, 2005
Location: FEC 141
Time: 2:15 p.m.
Anshul Shekhon <anshul@cs.unm.edu>
Department of Computer Science, UNM
Existing applications for option valuation are limited in their ability to value more than a few basic types of options. There seems to be no generic tool that can be used to value non-standard types of options. This work presents a new tool to value and analyze standard and many types of non-standard options, including exotic path dependent options. The parameters used for the models in this tool are carefully selected and are proposed as heuristics. Another useful features of this tool is its ability to provide better visual representation of sensitivities of option value, especially for exotic options. This application can help reduce the errors in valuations and will also improve risk analysis and management.
Ph.D. Dissertation Defense: Two Studies in Combinatorial Optimization
Date: February 28, 2005
Location: FEC 141
Time: 3:00 p.m.
Department of Computer Science, UNM
This thesis presents a pair of quite different investigations in combinatorial optimization. The first section is a study of "Contact-Map Overlap" (CMO), a graph-theoretic problem derived from computational biology, more specifically from protein structure comparison. This problem has a natural formulation as an integer linear program; using the AMPL modeling language, we have developed a robust, efficient, flexible implementation of the ILP. Our approach to CMO is primarily heuristic and empirical, focused on computational techniques that are useful for obtaining approximate solutions to problems large enough to be of biological interest. In particular, we present the first algorithm which can efficiently find provably near-optimal alignments of highly dissimilar proteins with about two hundred amino acids. We also present a simple but effective heuristic for approximating all-to-all comparison on a large collection of structurally similar proteins. In addition, we make a detailed investigation of the sensitivity of the underlying LP to numerical round-off errors.
The second part of this thesis is an investigation of what is usually called
the minimum-bends travelling-salesman problem; given a finite set of points,
find a piecewise-linear tour through all these points, minimizing either the
number of corners or the total angle of turn. This problem has been much less
studied than most other elementary geometric-combinatorial questions, in spite
of its wide variety of potential applications in robotics, sensor placement,
and machining. Our approach here is less heuristic and more rigorous, with proofs
of upper and lower bounds for a variety of instances. We have obtained considerable
improvements over previous results in this area, and have found new evidence
for a long-standing conjecture.
Ph.D. Dissertation Defense: Automated Methods for Creating Diversity in Computer Systems
Date: February
18, 2005
Location: FEC 141
Time: 2:00 p.m.
Gabriela Barrantes <gbarrant@cs.unm.edu>
Department of Computer Science, UNM
The ease of attacking multiple identical systems once one machine is compromised, combined with the pervasive homogeneity of computer system attached to the Internet, represents a serious security threat. Biological diversity provides a defense against unpredictable threats by maximizing the probability that some individuals will survive and replenish the population with a defense against that particular threat. Using natural diversity as inspiration, diversity in computer systems could confer security benefits by protecting against attacks that are new, or that have been modified to avoid signature detection, but that rely on known system invariants.
A diversity defense can render a standard attack ineffective or slow it down, depending on its placement and implementation. Diminishing the uniformity in existing systems is, however, a non-trivial task, as standardization must be maintained at many interface points in any given system. This dissertation intends to assess the costs and benefits benefits of adding diversity to existing computer systems by implementing diversity at different levels. Diversification in computer systems can be done at the interface or the implementation level. Three techniques to introduce automated diversity in existing systems are presented: one at the interface, and two at the implementation levels, and their effectiveness at stopping or slowing down attacks is studied.
The first diversity scheme presented is an interface diversification: a machine language randomization, named Randomized Instruction Set Emulation (RISE). RISE is intended as a protection against the threat of code-injection attacks, which insert malicious machine-language code into programs using different weaknesses during input processing. Code-injection attacks constitute one of the most prevalent threats on current networks, and its most famous example are the so-called buffer overflow attacks. RISE protects against all code-injection attacks regardless of their point of entry by creating a unique machine code per process. The current RISE implementation runs over an emulator, and maps all executable bytes of the process to a random mask created for the purpose of diversification. When injected code attempts to execute, its code is also mapped to the mask, but given that it was not correctly encoded, it is "decoded" to random bytes, which attempt to execute on the real processor and most of the time end up in a fatal exception. Even though the attack will not execute as intended, there is a small probability that the attack will manage to execute some random instructions. This work also offers an analysis of the risks associated with the execution of random instructions.
Many Denial of Service (DoS) attacks exploit a system's implementation rather than its interface. Two approaches to diversify an implementation are explored. The first one randomizes internal protocol parameters within acceptable ranges. It is tested against an attack targeting one of the TCP congestion control parameters, where it achieves the objective of keeping a portion of the hosts in the attacked network operating at larger bandwidths than if they were all using standard parameter values.
The second implementation diversification attempts to go deeper and modify the implementation of a task, with the intention of providing different execution behaviors. The target threat are Denial of Service attacks by resource exhaustion. These attacks usually anonimize the requests making them hard targets for detection. The proposed diversity solution gives each host a different attack-legitimate classifier filter. Each filter uses a distribution of legitimate and attack requests, and blocks requests considered illegitimate based on this distribution. Under attack, those filters with distributions closer to the real one created by the combination of attack and legitimate traffic, will be less affected by the DoS. The filters are created by using Genetic Programming trained with different attack-legitimate patterns. Current results suggest that it could be effective when used as a front-end for resource managers that are non-preemptive in nature.
MS Thesis Defense: Modeling Pathways of Cell Differentiation in Genetic Regulatory Networks With Random Boolean Networks
Date: Wednesday, February 16, 2005
Location: FEC 141
Time: 2:00 p.m.
Sheldon Dealy <sdealy@cs.unm.edu>
Department of Computer Science, UNM
It has long been known that some cells differentiate into other cell types. Under the assumption that living cells can be modeled using non-linear dynamical systems, then cell types are attractors. If a cell type is an attractor, then a series of steps from one attractor to another is a pathway of differentiation. As non-linear dynamical systems, random Boolean networks were analyzed for their suitability in a first attempt to model pathways of differentiation in model genetic regulatory networks. We hypothesize that perturbations of Boolean networks share some key behaviors of genetic regulatory networks in the process of differentiation. Furthermore, several solutions to the problem of mapping the discrete time nature of Boolean networks to the continuous nature of genetic regulatory networks are presented. The findings suggest that Boolean networks are able to support predictions about the pathways of differentiation in living cells. The predictions should be testable using gene array techniques.
MS Thesis Defense: Parameterized Phase Screen Modeling for Very Large Array
Date: Thursday, February 10, 2005
Location: FEC 141
Time: 10:30 a.m.
Chunai Cai <ccai@cs.unm.edu>
Department of Computer Science, UNM
This thesis is about modeling the tropospheric phase error of radio astronomical
data obtained from radio astronomical instrument, VLA. VLA is a Very Large Array
of 27 antennas designed to collect radio signals from celestial sources. The
astronomical data we are interested is called Visibility. Basically scientists
can make images of celestial sources they are interested in from these Visibility
data, which has amplitude and phase. To get a good image, one has to have good
Visibility data. But Visibility data is produced from the output of electrical
signals from the antennas. And the inputs to antennas are the radio signals,
which travel a long way and its amplitude is attenuated and its phase distorted
along the way. Besides, antennas themselves are not perfect instruments. They
introduce errors during their processing of the radio signals. Algorithms and
tools are developed and made available to astronomers and scientist to restore
ideally the true face of the radio signals by cleaning the Visibility data.
Phase error correction is critical for good imaging especially for high frequency
observations. There are not many existing algorithms for phase error correction.
In AIPS++ a least square fitting routine is used solve phase errors for each
antenna. In the case of VLA, if all 27 antennas are used, the parameters to
be solved for the least square fitting is 27. According to Tim Cornwell's work
[TC81], assuming all visibility data has the same variance, then the variance
in visibility data sv2 and the variance in total gain sG2 has the following
relationship: where S is the flux density of a point calibration source where
N is the number of gains to be solved. It is clear that if we reduce the number
N of unknowns to be solved, we can also reduce the variance in visibility data,
assuming the gain variance keeps stable.
In this thesis we propose a physical model of antenna tropospheric phase error
by associating the phase error with the antenna's positions. First order, second
order Taylor expansions along the antenna x, y coordinates are used for least
square fitting to the tropospheric phased error obtained from existing phase
correction software package AIPS++ for VLA. In this modeling we reduced the
number N of unknowns to be solved to a maximum of 10. Visibility data from different
VLA array configurations are used to test the model. The result shows the proposed
model works well for the most compact array configuration, but less ideal for
extended configuration like BnA.
MS Thesis Defense: Generating hard satisfiable 3-SAT instances
Date: Wednesday, February 9, 2005
Location: FEC 141
Time: 2:00 p.m.
Haixia Jia <hjia@cs.unm.edu>
Department of Computer Science, UNM
The evaluation of incomplete satisfiability solvers depends critically on the
availability of hard satisfiable instances. A plausible source of such instances
consists of random k -SAT formulas whose clauses are chosen uniformly from among
all clauses satisfying some randomly chosen truth assignment A . Unfortunately,
instances generated in this manner tend to be relatively easy and can be solved
efficiently by practical heuristics. Roughly speaking, as the formula's density
increases, for a number of different algorithms, A acts as a stronger and stronger
attractor. First, we introduce a simple twist on this basic model, which appears
to dramatically increase its hardness. Namely, in addition to forbidding the
clauses violated by the hidden assignment A , we also forbid the clauses violated
by its complement, so that both A and complement of A are satisfying. It appears
that under this ``symmetrization'' the effects of the two attractors largely
cancel out, making it much harder for algorithms to find any truth assignment.
We give theoretical and experimental evidence supporting this assertion. We
then introduce a highly structured family of hard satisfiable 3-SAT formulas
corresponding to an ordered spin-glass model from statistical physics. This
model has provably ``glassy'' behavior; that is, it has many local optima with
large energy barriers between them, so that local search algorithms get stuck
and have difficulty finding the true ground state, i.e., the unique satisfying
assignment. We give experimental evidence supporting its hardness.
MS Thesis Defense: First-Order Stochastic Systems for Diagnosis and Prognosis
Date: Friday, December 10, 2004
Location: FEC 141
Time: 9:00 am - 11:00 am
Chayan Chakrabarti <cc@cs.unm.edu>
Department of Computer Science, UNM
First-order systems are an important tool for knowledge representation in Artificial Intelligence research. The representational power of first-order systems is very broad. The inference algorithms for first-order systems are simple and tractable. Hence, first-order systems provide a compact yet powerful tool for modeling complex environments.
Pless and Luger (2002) explored a new logic-based first-order stochastic modeling language named Loopy Logic. This language aims at solving problems with repetitive, or recursive, structures. It can represent product distribution effectively and can be translated to a well-known and effective inference algorithm: loopy belief propagation. Parameter learning is naturally supported in this framework.
This thesis explores the representational power of Loopy Logic. Three applications of Loopy Logic in the domain of diagnosis and prognosis are analyzed in detail. These applications are diagnosis of digital circuits, fault detection in mechanical systems and parameter learning in a life support simulation.
MS Thesis Defense: Applying Simulated Data to Evaluation of EST Clustering Algorithms
Date: Wednesday November 17, 2004
Location: FEC 141
Time: 3:00pm - 5:00 pm
Ke Ye <keye@cs.unm.edu>
EST clustering algorithms have many applications on biological data such as clustering genes that are from the same cells but from the different gene products, identifying new genes, and discovering alternative splicing etc. In these applications, clustering performance and quality are the most critical issues. Computer scientists have enhanced the computational efficiency by improving both the distance and clustering algorithms, but the quality of clustering is hard to assess, as we never know the correct answers for the real biological data. In this project, I utilized an EST generator tool (SeqGen) to generate artificial ESTs and made use of the simulated datasets to evaluate four EST clustering algorithms-SmithWaterman, ESTate, N2tool and Blastclust with regard to the clustering qualities
Ph.D. Dissertation Defense: Modeling Intercellular Interactions in the Peripheral Immune System
Date: Monday September 13, 2004
Time: 1pm-3pm
Location: FEC 141
Christy Warrender <christy@cs.unm.edu>
Department of Computer Science, UNM
Abstract:
Peripheral tissue microenvironments play an important role in determining
responses to infection, but these small-scale interactions are usually
not included in immune system models. An immune response to a
pathogen involves numerous cells of several different types, each with
a limited ability to sense its environment and to act. These cells
produce many kinds of signalling molecules called cytokines that
act locally to affect cell behavior. The dynamical interactions among
this heterogeneous collection of cells and molecules are extremely
complex. To begin to understand such a system, we need modeling
approaches that can relate population-level dynamics to individual
cell interactions and responses to cytokines. I will describe a spatially
explicit simulator that tracks individual cell responses to local molecular
signals, and the use of that simulator to study models of peripheral tissue
dynamics in normal and disease conditions.
Ph.D. Dissertation Defense: "Modularity and the Evolution of Software Evolvability"
Date: Wendesday August 11, 2004
Time: 1:00 pm
Location: FEC 141
Terry Van Belle <vanbelle@aruna.ca>
Computer Science Department, UNM
Drawing on models of the evolution of living systems, this dissertation
explores the principle of modularity, both biological and in software,
and its role in creating structures that are easy to change.
These ideas are captured in the Software Evolvability Change Optimization
(SECO) model, a framework for investigating how modularity can enhance
evolvability in software.
SECO abstracts software history by dividing the code into non-overlapping elements that are linked together by a series of changes. These changes are either gathered from the recorded histories of real software, or modeled using evolutionary computation, change propagation among elements, or correlations in changes between elements.
The dissertation uses SECO in both an analytic and synthetic role, investigating aspects of modularity such as encapsulation and code factoring, and using automatic techniques to optimize the modular structure of real code.
The dissertation contributes to the further understanding of modularity as a means of improving software evolvability by adding the dimension of time to the analysis. In this way, it can discover dependency links between software elements that are not evident from a static analysis of the program.
Ph.D. Dissertation Defense: High Performance Computing Spare Replacement Hardware Fault Tolerance
Date: Monday, March 8, 2004
Time: 10am-11am
Location: Farris Engineering Center 141
Jared Samuel Dreicer, <jared@cs.unm.edu>
Computer Science Department, UNM
Abstract:
The use of spare replacement hardware and checkpoint rollback software fault
tolerance on Multiple-Instruction-Multiple-Data (MIMD) architecture was investigated.
New performance results are presented for spare node replacement after simulated
failure and migration onto spare node prior to simulated failure. Spare replacement
and migration onto spare were implemented for application parameter characterization
runs on 32 nodes and scaling runs from 8 - 128 nodes on a MIMD cluster. The
CUMULVS system was used for fault tolerant and control features. We evaluated
the spare node replacement and migration onto spare node approaches using runtime
to quantify performance and demonstrate viability of the approaches.
The principle new results of this dissertation are that:
1) Spare node replacement provides good performance at a small cost in runtime;
2) Migration onto spare provides even better performance at a small cost in
runtime; and
3) A runtime breakeven point dependent on system scale is identified for both
approaches relative to traditional approaches.
Results were quantified for empirical studies on 8 - 128 nodes. These studies investigated applications characterized by various computation-communication ratios, work patterns (steady, accumulate, disperse, hill, and hole), and various topologies (ring, one-to-all, and near neighbor). The decrease in the cost of commodity hardware enables strategies that can efficiently use a spare as a general means of dynamic redundancy. The gain resulting from these approaches are due to decreased recovery time given immediate access to a spare, the Mean Time To Repair (MTTR) is reduced. Checkpoint and rollback overhead is still incurred, but for migration onto spare checkpoint overhead can be dramatically reduced. The scale of distributed memory MIMD architectures continue to grow due to user requests for greater performance, their increased computational requirements involving finer resolution, and enabled by the decreasing cost of commodity hardware. However, these larger architectures experience an increasing frequency of component failures and subsequent loss of availability. Fault tolerance and availability are therefore important issues for high performance computing systems executing long running applications. Our research indicates that utilizing spare replacement enhances scalability and availability of MIMD architectures and that further research will pay important dividends.
Master's Thesis Defense: The Effects of the Network on Parallel Rendering
Date: Friday, January 30, 2004
Time: 11am-12:15pm
Location: Woodward 149
Ye Cong, <yecong@cs.unm.edu>
Department of Computer Science, UNM
Abstract:
Many modern graphics systems are built with commodity components such as personal
computers, graphics accelerators, and projection devices. This
architecture results in low price-to-performance ratio, yet a potential problem
arises: the graphics accelerators can deliver very high rendering speed, however
the graphic distribution relies on the existing network. This thesis describes
how the network affects the performance of parallel rendering on a PC cluster.
First I show that parallel rendering is more sensitive to network bandwidth
than network latency and overhead. Then, I introduce a specific cluster-based
graphics system and some experiments done on this system to reveal the effects
of the network on parallel rendering from different aspects. In order to show
that the software layers add considerable overhead, some performance comparisons
using MPICH, LAMMPI and Chromium are made.
