UNM Computer Science

Search Technical Reports by Keyword



This uses a sub-string search, so that "graph" matches both "graph" and "graphical".

This search only searches by keyword. If you'd like, you can also search by researcher.

Found 10 results.

Listing from newest to oldest



TR-CS-2009-09

Palacios and Kitten: High Performance Operating Systems For Scalable Virtualized and Native Supercomputing
John Lange, Kevin Pedretti, Trammell Hudson, Peter Dinda, Zheng Cui, Lei Xia, Patrick Bridges, Steven Jaconette, Mike Levenhagen, Ron Brightwell, Patrick Widener

Palacios and Kitten are new open source tools that enable applications, whether ported or not, to achieve scalable high performance on large machines. They provide a thin layer over the hardware to support both full-featured virtualized environments and native codebases. Kitten is an OS under development at Sandia that implements a lightweight kernel architecture to provide predictable behavior and increased flexibility on large machines, while also providing Linux binary compatibility. Palacios is a VMM that is under development at Northwestern University and the University of New Mexico. Palacios, which can be embedded into Kitten and other OSes, supports existing, unmodified applications and operating systems by using virtualization that leverages hardware technologies. We describe the design and implementation of both Kitten and Palacios. Our benchmarks show that they provide near native, scalable performance. Palacios and Kitten provide an incremental path to using supercomputer resources that is not performance-compromised.

PDF



TR-CS-2008-10

A High-Level Implementation of Non-Deterministic, Unrestricted, Independent And-Parallelism
Amadeo Casas, Manuel Carro, Manuel V. Hermenegildo

The growing popularity of multicore architectures has renewed interest in language-based approaches to the exploitation of parallelism. Logic programming has proved an interesting framework to this end, and there are parallel implementations which have achieved significant speedups, but at the cost of a quite sophisticated low-level machinery. This machinery has been found challenging to code and, specially, to maintain and expand. In this paper, we follow a different approach which adopts a higher level view by raising some of the core components of the implementation to the level of the source language. We briefly present an implementation model for independent and-parallelism which fully supports non-determinism through backtracking and provides extensible solutions for some of the main problems found in previous and-parallel implementations. Our proposal is able to optimize the execution for the case of deterministic programs and to exploit unrestricted and-parallelism, which allows exposing more parallelism among clause literals than fork-join-based proposals. We present performance results for an implementation, including data for benchmarks where and-parallelism is exploited in non-deterministic programs.

PDF



TR-CS-2007-17

Structured Streams: Data Services for Petascale Science Environments
Patrick M. Widener, Matthew Wolf, Hasan Abbasi, Matthew Barrick, Jay Lofstead, Jack Pullikottil, Greg Eisenhauer, Ada Gavrilovska, Scott Klasky, Ron Oldfield, Patrick G. Bridges, Arthur B. Maccabe, Karsten Schwan.

The challenge of meeting the I/O needs of petascale applications is exacerbated by an emerging class of data-intensive HPC applications that requires annotation, reorganization, or even conversion of their data as it moves between HPC computations and data end users or producers. For instance, data visualization can present data at different levels of detail. Further, actions on data are often dynamic, as new end-user requirements introduce data manipulations unforeseen by original application developers. These factors are driving a rich set of requirements for future petascale I/O systems: (1) high levels of performance and therefore, flexibility in how data is extracted from petascale codes; (2) the need to support .on demand. data annotation . metadata creation and management . outside application codes; (3) support for concurrent use of data by multiple applications, like visualization and storage, including associated consistency management and scheduling; and (4) the ability to flexibly access and reorganize physical data storage.

We introduce an end-to-end approach to meeting these requirements: Structured Streams, streams of structured data with which methods for data management can be associated whenever and wherever needed. These methods can execute synchronously or asynchronously with data extraction and streaming, they can run on the petascale machine or on associated machines (such as storage or visualization engines), and they can implement arbitrary data annotations, reorganization, or conversions. The Structured Streaming Data System (SSDS) enables high-performance data movement or manipulation between the compute and service nodes of the petascale machine and between/on service nodes and ancillary machines; it enables the metadata creation and management associated with these movements through specification instead of application coding; and it ensures data consistency in the presence of anticipated or unanticipated data consumers. Two key abstractions implemented in SSDS, I/O graphs and Metabots, provide developers with high-level tools for structuring data movement as dynamically-composed topologies. A lightweight storage system avoids traditional sources of I/O overhead while enforcing protected access to data.

This paper describes the SSDS architecture, motivating its design decisions and intended application uses. The utility of the I/O graph and Metabot abstractions is illustrated with examples from existing HPC codes executing on Linux Infiniband clusters and Cray XT3 supercomputers. Performance claims are supported with experiments benchmarking the underlying software layers of SSDS, as well as application-specific usage scenarios.

PDF



TR-CS-2007-13

Dominance: Modeling Heap Structures with Sharing
Mark Marron, Rupak Majumdar, Darko Stefanovic, Deepak Kapur

A number of papers have used predicate languages over sets of abstract locations to model the heap (decorating a heap graph with the predicates, or in conjunction with an access path abstraction). In this work we introduce a new predicate, dominance, which is a generalization of aliasing and is used to model how objects are shared in the heap (e.g. do two lists contain the same set of objects?) and how sharing influences the results of destructive updates (e.g. if all the objects in one list are modified does that imply that all the objects in another list are modified?). The dominance relation is introduced in the context of a graph-based heap model but the concept is general and can be incorporated in other frameworks as a high-level primitive.

The motivation for introducing a higher-order predicate to model sharing is based on the success of existing approaches which use connectivity, reachability, and sharing predicates to perform shape analysis using high-level graph-based models. Our approach provides an analysis technique that is efficient and scalable while retaining a high level of accuracy. This is in contrast to proposals in the literature based on using logical languages (usually built on top of first-order predicate logic) which either are highly expressive and very general (resulting in a computationally expensive analysis) or utilize only fragments of the full logic (which reduces the computational cost but results in a much less powerful analysis).

The approach presented in this paper has been implemented and run on a variation of the Jolden benchmarks. The information extracted using the analysis has been used to parallelize these programs with a near optimal speedup for 7 of the 9 benchmarks. The runtimes are small enough to make the analysis practical (a few seconds for each program).

PDF



TR-CS-2005-10

Generating Hard Satisfiable Formulas by Hiding Solutions Deceptively
Haixia Jia, Cristopher Moore, and Doug Strain

To test incomplete search algorithms for constraint satisfaction problems such as 3-SAT, we need a source of hard, but satisfiable, benchmark instances. A simple way to do this is to choose a random truth assignment A, and then choose clauses randomly from among those satisfied by A. However, this method tends to produce easy problems, since the majority of literals point toward the ``hidden'' assignment A. Last year, Achlioptas, Jia and Moore proposed a problem generator that cancels this effect by hiding both A and its complement. While the resulting formulas appear to be just as hard for DPLL algorithms as random 3-SAT formulas with no hidden assignment, they can be solved by WalkSAT in only polynomial time.

Here we propose a new method to cancel the attraction to A, by choosing a clause with t > 0 literals satisfied by A with probability proportional to q^t for some q < 1. By varying q, we can generate formulas whose variables have no bias, i.e., which are equally likely to be true or false; we can even cause the formula to ``deceptively'' point away from A. We present theoretical and experimental results suggesting that these formulas are exponentially hard both for DPLL algorithms and for incomplete algorithms such as WalkSAT.

PDF



TR-CS-2003-47

A Parallel State Assignment Algorithm for Finite State Machines
David A. Bader and Kamesh Madduri

This paper summarizes the design and implementation of a parallel algorithm for state assignment of large Finite State Machines. High performance CAD tools are necessary to overcome the computational complexity involved in the optimization of large sequential circuits. FSMs constitute an important class of logic circuits and state assignment is one of the key steps in combinational logic optimization. The SMP-based parallel algorithm -- based on the sequential program JEDI targeting multilevel logic implementation -- scales nearly linearly with the number of processors for FSMs of varying problem sizes chosen from standard benchmark suites while attaining quality of results comparable to the best sequential algorithms. The source code for our parallel implementation of JEDI is freely-available from our web site.

PDF



TR-CS-2003-40

A Tool for Monitoring and Recording Heap-Allocated Object Behavior
Qingfeng Duan

A tool, to be used for monitoring and recording the heap-allocated object behavior, is designed, implemented, evaluated, and documented. Object-oriented programming languages, such as Java, require the support of automatic memory management (garbage collection), because of their intensive heap allocation. Modern garbage collection techniques rely on exploiting the object behaviors. These behaviors include the ages, the types, and the sizes of objects, and the references among objects. For example, the Appel generational copying collector and the Older-First collector are built on the basis of the distribution of object ages. To understand these object behaviors and thus be able to improve garbage collection techniques, we need a simulation tool. The tool described here correlates the low-level read-write data accesses with the high-level object characteristics. When an object is allocated, modified, moved, or deallocated, the tool monitors and records this information. By further analyzing this information, one obtains the relevant data to understand the desired object behaviors. The tool consists of three components: IBM's Jikes RVM, Dynamic SimpleScalar, and an off-line Analyzer. Jikes RVM is a Java virtual machine which itself is written in Java; Dynamic SimpleScalar is a machine-level emulator to simulate a PowerPC model running the AIX operating system; and the Analyzer is used to postprocess the results generated by the first two components. To be running, the entire tool maintains a layering of structures: the Java program, Jikes RVM, Dynamic SimpleScalar, and the host machine, in which the Java program resides at the first layer. We evaluate our tool using three SPECjvm98 Java benchmarks. We also illustrate how the tool can be used to analyze the write statistics to the fields of an object with certain class type. In general, this tool could have great significance in garbage collection research, by being able to provide accurate and flexible analyses at the heap-allocated object level.

gzipped postscript



TR-CS-2003-28

Object Lifetime Prediction in Java
Hajime Inoue, Darko Stefanovic, and Stephanie Forrest

Accurately predicting the lifetimes of objects in object-oriented and functional languages is important because such predictions can be used to improve memory management performance at run-time. A typical approach to this prediction problem is first to observe object lifetimes by tracing a sample program execution, then to construct a predictor function based on these observations, and finally to test the predictions on reference program executions. Four quantitative measures characterize this approach: coverage, the proportion of objects in the reference program execution for which the predictor function is defined; accuracy, the fraction of predicted lifetimes that are correct; precision, the granularity of the predictions; and size, the size of the predictor itself. These four properties are not independent; for example, increased precision often leads to less coverage and accuracy.

We describe a fully precise prediction method and report experimental results on its performance. By ``fully precise'' we mean that the granularity of predictions is equal to the smallest unit of allocation. We show that for a number of benchmark programs in the Java programming language, fully precise prediction can be achieved, together with high coverage and accuracy. Our results also show that a significant proportion of objects have a measured lifetime of zero, a fact which a dynamic compiler could use to avoid explicit allocation. The method described here is the first to combine high-precision and efficiency in a single lifetime predictor.

gzipped postscript



TR-CS-2003-24

An MPI Tool to Measure Application Sensitivity to Variation in Communication Parameters
Edgar A. Leon, Arthur B. Maccabe and Ron Brightwell

This work describes an apparatus which can be used to vary communication performance parameters for MPI applications, and provides a tool to analyze the impact of communication performance on parallel applications. Our tool is based on Myrinet (along with GM). We use an extension of the LogP model to allow greater flexibility in determining the parameter(s) to which parallel applications may be sensitive. We show that individual communication parameters can be independently controlled within a small percentage error. We also present the results of using our tool on a suite of parallel benchmarks.

gzipped postscript | PDF



TR-CS-2002-13

COMB: A Portable Benchmark Suite for Assessing MPI Overlap
William Lawry, Christopher Wilson, Arthur. B. Maccabe, and Ron Brightwell

This paper describes a portable benchmark suite that assesses the ability of cluster networking hardware and software to overlap MPI communication and computation. The Communication Offload MPI-based Benchmark, or COMB, uses two different methods to characterize the ability of messages to make progress concurrently with computational processing on the host processor(s). COMB measures the relationship between overall MPI communication bandwidth and host CPU availability. In this paper, we describe the two different approaches used by the benchmark suite, and we present results from several systems. We demonstrate the utility of the suite by examining the results and comparing and contrasting different systems.

gzipped postscript | PDF