UNM Computer Science

Colloquia



Okay, we can compute it. Now what?

Date: Friday, December 9, 2011
Time: 12:00 pm — 12:50 pm
Place: Centennial Engineering Center 1041

Andrew Wilson
Sandia National Laboratory

As data sizes grow, so too does the cognitive load on the scientists who want to use the data and the computational load for running their analyses and queries. The paradigm common in visualization of "show everything and let the analyst sort through it" is already failing on medium-large data sets (tens of terabytes) because of the difficulty of identifying exactly which parts of the data are 'interesting'.

I will argue that the separation of computation and analysis is improper when working with large data. The process of identifying and labeling higher-order structure in the data -- the fundamental goal of analysis -- must begin in the computation itself. Moreover, the metaphors and abstractions used for analysis must preserve and summarize meaning at some desired scale so that a high-level overview will give immediate clues to small-scale features of interest.

Bio:
Andrew Wilson is a senior member of the technical staff at Sandia National Laboratories in Albuquerque, New Mexico. The problem of computing with large data descended upon him during his first week of graduate school and has occupied his attention since then.

While orbiting the issue he has worked on facets of the end-to-end processing of large data, starting with data import and ending with visual representations, with excursions into cybersecurity, information visualization, graph algorithms, statistical analysis of ensembles of simulation runs, parallel topic modeling and system architectures for data-intensive computing. He received his Ph.D. from the University of North Carolina in 2002.

Sparse Matrix Transform for Hyperspectral Image Processing

Date: Friday, December 2, 2011
Time: 12:00 pm — 12:50 pm
Place: Centennial Engineering Center 1041

James Theiler
Los Alamos National Laboratory

Many problems in image processing require that a covariance matrix be accurately estimated, often from a limited number of data samples. This is particularly challenging for hyperspectral imagery, where the number of spectral channels can run into the hundreds. The Sparse Matrix Transform (SMT) provides a parsimonious, computation-friendly, and full-rank estimator of covariance matrices. But unlike other covariance regularization schemes, which deal with the eigenvalues of a sample covariance, the SMT works with the eigenvectors. This talk will describe the SMT and its utility for a range of problems that arise in hyperspectral data analysis, including weak signal detection, dimension reduction, anomaly detection, and anomalous change detection.

Bio:
James Theiler finished his doctoral dissertation at Caltech in 1987, with a thesis on statistical and computational aspects of identifying chaos in time series. He followed a nonlinear trajectory to UCSD, MIT Lincoln Laboratory, Los Alamos National Laboratory, and the Santa Fe Institute. His interests in statistical data analysis and in having a real job were combined in 1994, when he joined the Space and Remote Sensing Sciences Group at Los Alamos. In 2005, he was named a Los Alamos Laboratory Fellow. His professional interests include statistical modeling, image processing, remote sensing, and machine learning. Also, covariance matrices.

Complex causal learning

Date: Friday, November 11, 2011
Time: 12:00 pm — 12:50 pm
Place: Centennial Engineering Center 1041

David Danks
Carnegie Mellon University

In the past twenty years, multiple machine learning algorithms have been developed that learn causal structure from observational or experimental data. Most of the algorithms were designed, however, for relatively "clean" data from linear systems, and so are often not applicable to real-world problems. In this talk, I will first outline the principles underlying this type of causal learning, and then examine three new algorithms developed for more complex causal learning: specifically, for non-linear and/or non-Gaussian data, and for learning from multiple, overlapping datasets. Time permitting, I will provide case studies (e.g., from oceanography) showing these algorithms in action.

Bio:
David Danks is an Associate Professor of Philosophy & Psychology at Carnegie Mellon University, and a Research Scientist at the Institute for Human & Machine Cognition. His research focuses on the interface of cognitive science and machine learning: using the tools of machine learning to better understand complex human cognition, and developing novel machine learning algorithms based on human cognitive capacities. His research has centered on causal learning and reasoning, category development and application, and decision-making.

Achieving High Read Performance from a Write Optimized File System

Date: Friday, November 4, 2011
Time: 12:00 pm — 12:50 pm
Place: Centennial Engineering Center 1041

Adam Manzanares
Los Alamos National Laboratory

This talk will focus on The Parallel Log Structured File System (PLFS) that was developed at the Los Alamos National Laboratory (LANL) to improve shared file write performance. Write performance is improved as PLFS transparently transforms the writes such that each process, while logically writing to a shared file, is physically writing to a unique file. By removing this concurrency, PLFS improved the write performance of many applications by multiple orders of magnitude. However, reconstructing the logical file from the multitude of physical files has proven difficult. To alleviate this issue we developed several collective techniques to aggregate information from multiple component pieces. This enables PLFS to maintain it's large write improvements without sacrificing read performance for many workloads. There are other workloads, however, which remain challenging. Currently, Los Alamos is developing a scalable HPC key-value store to address these remaining challenges. Additionally, the transformative properties of PLFS have recently also been leveraged to improve the metadata performance of a production parallel file system.

Bio:
Adam Manzanares is currently a Nicholas C. Metropolis postdoctoral fellow at the Los Alamos National Laboratory (LANL). He was appointed this position in November 2010 after joining LANL in July 2010 as a postdoctoral researcher. Dr. Manzanares received his Ph.D. from Auburn University in May 2010 with a focus on energy efficient storage systems. Dr. Manzanares is currently focused on storage systems for high performance computing applications. Dr. Manzanares develops middleware layers to improve the performance of HPC storage systems. Dr. Manzanares is also currently researching compression techniques and data formatting libraries for scientific data sets.

From Intrinsic to Designed Computation

Date: Friday, October 28, 2011
Time: 12:00 pm — 12:50 pm
Place: Centennial Engineering Center 1041

Christof Teuscher
Portland State University, Department of Electrical and Computer Engineering

The computing disciplines face difficult challenges by further scaling down CMOS technology. One solution path is to use an innovative combination of novel devices, compute paradigms, and architectures to create new information processing technology. It is reasonable to expect that future devices will increasingly exhibit extreme physical variation, and thus have a partially or entirely unknown structure with limited functional control. Emerging devices are also expected to behave in time-dependent nonlinear ways, beyond a simple behavior. It is premature to say what computing model is the best match for such devices. To address this question, our research focuses on a design space exploration of building information processing technology with spatially extended, heterogeneous, disordered, dynamical, and probabilistic devices that we cannot fully control understand. In this talk I will present recent results on computing with such systems. We draw inspiration from the field of reservoir computing to obtain a "designed" computation from the "intrinsic" computing capabilities of the underlying device networks. We study the structural and functional influence of the underlying devices, their network, and the cost on the computing task performance and robustness. The goal is to find optima in the design space. The technological promise of harnessing intrinsic computation has enormous potential for cheaper, faster, more robust, and more energy-efficient information processing technology.

Bio:
Christof Teuscher is an assistant professor in the Department of Electrical and Computer Engineering (ECE) with joint appointments in the Department of Computer Science and the Systems Science Graduate Program. He also holds an Adjunct Assistant Professor appointment in Computer Science at the University of New Mexico (UNM). Dr. Teuscher obtained his M.Sc. and Ph.D. degree in computer science from the Swiss Federal Institute of Technology in Lausanne (EPFL) in 2000 and 2004 respectively. His main research focuses on emerging computing architectures and paradigms.

Real-Time Modeling and Rendering of Natural Phenomena

Date: Tuesday, October 25, 2011
Time: 11:00 am — 11:50 am
Place: Farris Engineering Center, Room 141

Robert Geist
School of Computing, Clemson University

Modeling and rendering natural phenomena, which includes all components of biophysical ecology, atmospherics, photon transport, and air and water flow, remains a challenging area for computer graphics research. Whether models are physically-based or procedural, model processing is almost always characterized by substantial computational demands which have almost always precluded real-time performance. Nevertheless, the recent development of new, highly parallel computational models, coupled with dramatic performance improvements in GPU-based execution platforms, has brought real-time modeling and rendering within reach. The talk will focus on the natural synergy between GPU-based computing and the so-called lattice-Boltzmann methods for solutions to PDEs. Examples will include photon transport for global illumination and modeling and rendering of atmospheric clouds, forest ecosystems, and ocean waves.

Bio:
Robert Geist is a Professor in the School of Computing at Clemson University. He served as Interim Director of the School in 2007-2008, and he is co-founder of Clemson's Digital Production Arts Program. He received an M.A. in computer science from Duke University and a Ph.D. in mathematics from the University of Notre Dame. He was an Associate Professor of Mathematics at the University of North Carolina at Pembroke and an Associate Professor of Computer Science at Duke University before joining the faculty at Clemson University. He is a member of IFIP WG 7.3, a recipient of the Gunther Enderle Award (Best Paper, Eurographics), and a Distinguished Educator of the ACM.

The Case for Efficiency in High-Performance Computing

Date: Friday, October 21, 2011
Time: 12:00 pm — 12:50 pm
Place: Centennial Engineering Center 1041

David Lowenthal
Department of Computer Science, The University of Arizona

Traditionally, high-performance computing (HPC) has been performance based, with speedup serving as the dominant metric. However, this is no longer the case; other metrics have become important, such as power, energy, and total cost of ownership. Underlying all of these is the notion of efficiency.

In this talk we focus on two different areas in which efficiency in HPC is important: power efficiency and performance efficiency. First, we discuss the Adagio run-time system, which uses dynamic frequency and voltage scaling to improve power efficiency in HPC programs while sacrificing little performance. Adagio locates tasks off of the critical path at run time and executes them at lower frequencies on subsequent timesteps. Second, we discuss our work to improve performance efficiency. We describe a regression-based technique to accurately predict program scalability. We have applied our technique to both strong scaling, where the problem size is fixed as the number of processors increases, as well as time-constrained scaling, where the problem size instead increases with the number of processors such that the total run time is constant. With the former, we avoid using processors that result in inefficiency, and with the latter, we allow accurate time-constrained scaling, which is commonly desired by application scientists yet nontrivial. We conclude with some ideas about where efficiency will be important in HPC in the future.

Bio:
David Lowenthal is a Professor of Computer Science at the University of Arizona. He received his Ph.D at the University of Arizona in 1996, and was on the faculty at the University of Georgia from 1996-2008 before returning to Arizona in 2009. His research centers on addressing fundamental problems in parallel and distributed computing, such as scalability prediction and power/energy reduction, through a system software perspective. His current focus is on solving pressing power and energy problems that will allow exascale computing to become a reality within the decade.

Making Empathic Systems A Reality

Date: Friday, October 7, 2011
Time: 12:00 pm — 12:50 pm
Place: Centennial Engineering Center 1041

Peter Dinda
Northwestern University

Although it is largely invisible to the user, systems software makes a wide range of decisions that directly impact the user's experience through their effects on performance. Most systems software assumes a canonical user. However, we have demonstrated that the measured user satisfaction with any given given decision varies broadly across actual users. This effect appears solidly in areas as diverse as client-side CPU and display power management, server side virtual machine scheduling, and in networks. Empathic systems acknowledge this effect and employ direct global feedback from the individual end-user in the systems-level decision-making process. This makes it possible to (a) satisfy individual users despite this diversity in response, and (b) do so with low resource costs, typically far lower than under the assumption of a canonical user. However, it is challenging to build empathic systems as the user interface to the systems software must present minimal distractions. In this talk, I will expand on the empathic systems model and our results in applying it in the areas described above. I will also describe some of our current efforts in using biometrics to make the empathic systems user interface largely invisible. More information about this work can be found at empathicsystems.org.

Bio:
Peter Dinda is a professor in the Department of Electrical Engineering and Computer Science at Northwestern University, and head of its Computer Engineering and Systems division, which includes 17 faculty members. He holds a B.S. in electrical and computer engineering from the University of Wisconsin and a Ph.D. in computer science from Carnegie Mellon University. He works in experimental computer systems, particularly parallel and distributed systems. His research currently involves virtualization for distributed and parallel computing, programming languages for parallel computing, programming languages for sensor networks, and empathic systems for bridging individual user satisfaction and systems-level decision-making. You can find out more about him at pdinda.org.

Visualizing Compiled Executables for Malware Analysis

Date: Friday, September 30, 2011
Time: 12:00 pm — 12:50 pm
Place: Centennial Engineering Center 1041

Daniel Quist
Advanced Computing Solutions, Los Alamos National Laboratory

Reverse engineering malware is a vital skill that is in constant demand. The existing tools require high-level training and understanding of computer architecture, software, compilers, and many other areas of computer science. Our work covers several areas that are made to lower the barrier of entry to reverse engineering. First, we will introduce a hypervisor based automatic malware analysis system. Second, we will showcase our binary instrumentation framework for analyzing commercial software. Finally, we will show our graph-based dynamic malware execution tracing system named VERA. Each of these systems reduces the complexity of the reverse engineering process, and enhances productivity.

Bio:
Daniel Quist is a research scientist at Los Alamos National Laboratory, and founder of Offensive Computing, an open malware research site. His research is in automated analysis methods for malware with software and hardware assisted techniques. He has written several defensive systems to mitigate virus attacks on networks and developed a generic network quarantine technology. He consults with both private and public sectors on system and network security. His interests include malware defense, reverse engineering, exploitation methods, virtual machines, and automatic classification systems. Danny holds a Ph.D. from the New Mexico Institute of Mining and Technology. He has presented at several industry conferences including Blackhat, RSA, and Defcon.

An Overview of HPC Resilience and an Approach to Soft Error Fault Injection

Date: Friday, September 23, 2011
Time: 12:00 pm — 12:50 pm
Place: Centennial Engineering Center 1041

Nathan DeBardeleben
Los Alamos National Laboratory

Over the next decade the field of high performance computing (supercomputing) will undoubtedly see major changes in the ways leadership class machines are built, used, and maintained. There are any number of challenges including operating systems, programming models and languages, power, and file systems to name but a few. This talk will focus on one of those challenges, the cross-cutting goal of providing reliable computation on fundamentally unreliable components. Nathan will provide an overview of the field of resilience and point to decadal obstacles, look at potential solutions that appear promising, and discuss areas that appear to need more emphasis. Nathan's own new research on a soft error fault injection (SEFI) framework will be presented as will some early results. SEFI is intended as a framework for determining the resilience of a target application to soft errors. The initial implementation using a processor emulator virtual machine will be discussed as will reasons SEFI might be moving away to a dynamic instrumentation approach.

Bio:
Nathan DeBardeleben is a research scientist at Los Alamos National Laboratory leading the HPC Resilience effort in the Ultrascale Systems Research Center (USRC). He joined LANL in 2004 after receiving his PhD, Master's, and Bachelor's in computer engineering from Clemson University. At LANL, Nathan was an early developer and designer of the Eclipse Parallel Tools Platform (PTP) project, spent several years optimizing application codes, and has since turned to focus on resilient computation. Nathan is active in the resilience community and spent 2010 on an IPA assignment at the U.S. Department of Defense where he lead the Resilience Thrust of the Advanced Computing Systems Research Program. Active on several program committees, Nathan leads the Fault-Tolerance at Extreme Scale Workshop. His own research interests are in the field of reliable computation, particularly the area of HPC resilience. This includes, but is not limited to, fault-tolerance, resilient programming models, resilient application design, and soft errors (particularly those transient in nature).

(Students with interests in Dr. DeBardeleben's research wanting to meet with him over lunch should contact Dorian Arnold (darnold@cs.unm.edu) )

Constrained Relay Node Placement in Wireless Sensor Networks: Formulation and Approximations

Date: Friday, September 16, 2011
Time: 12:00 pm — 12:50 pm
Place: Centennial Engineering Center 1041

Satyajayant Misra
New Mexico State University

Deployment characteristics of sensor nodes and their energy limited nature affects network connectivity, lifetime, and fault-tolerance of wireless sensor networks (WSNs). One approach to address these issues is to deploy some relay nodes to communicate with the sensor nodes, other relay nodes, and the base stations in the network. The relay node placement problem for WSNs is concerned with placing a minimum number of relay nodes into a WSN to meet certain connectivity or survivability requirements. Previous studies have concentrated on the unconstrained version of the problem in the sense that relay nodes can be placed anywhere. In practice, there may be some physical constraints on the placement of relay nodes. To address this issue, we have studied constrained versions of the relay node placement problem, where relay nodes can only be placed at a set of candidate locations.

I will talk about relay node placement for connectivity and survivability, we will discuss the computational complexity of the problems and look at a framework of polynomial time O(1)-approximation algorithms with small approximation ratios. I will share our numerical results. We will also talk about some pertinent extensions of this work in the area of high performance computing.

Bio:
Dr. Satyajayant Misra is an assistant professor at New Mexico State University (from fall 2009). His research interests include anonymity, security, and survivability in wireless sensor networks, wireless ad hoc networks, and vehicular networks and optimized protocol designs for next supercomputing architectures.

Dr. Misra serves on the editorial boards of the IEEE Communications on Surveys and Tutorials and the IEEE Wireless Communications Magazine. He is the TPC Vice-Chair of Information Systems for the IEEE INFOCOM 2012. He has served on the executive committees of IEEE SECON 2011 and IEEE IPCCC 2010. He is the recipient of New Mexico State University's University Research Council Early Career Award for Exceptional Achievement in Creative Scholastic Activity, for the year 2011.

Exascale Computing and the Role of Co-design

Date: Friday, September 9, 2011
Time: 12:00 pm — 12:50 pm
Place: Centennial Engineering Center 1041

Sudip Dosanjh
Sandia National Labs

Achieving a thousand-fold increase in supercomputing technology to reach exascale computing (1018 operations per second) in this decade will revolutionize the way supercomputers are used. Predictive computer simulations will play a critical role in achieving energy security, developing climate change mitigation strategies, lowering CO2 emissions and ensuring a safe and reliable 21st century nuclear stockpile. Scientific discovery, national competitiveness, homeland security and quality of life issues will also greatly benefit from the next leap in supercomputing technology. This dramatic increase in computing power will be driven by a rapid escalation in the parallelism incorporated in microprocessors. The transition from massively parallel architectures to hierarchical systems (hundreds of processor cores per CPU chip) will be as profound and challenging as the change from vector architectures to massively parallel computers that occurred in the early 1990's. Through a collaborative effort between laboratories and key university and industrial partners, the architectural bottlenecks that limit supercomputer scalability and performance can be overcome. In addition, such an effort will help make petascale computing pervasive by lowering the costs for these systems and dramatically improving their power efficiency.

The U.S. Department of Energy's strategy for reaching exascale includes:


The last element, co-design, is a particularly important area of emphasis. Applications and system software will need to change as architectures evolve during the next decade. At the same time, there is an unprecedented opportunity for the applications and algorithms community to influence future computer architectures. A new co-design methodology is needed to make sure that exascale applications will work effectively on exascale supercomputers.

Bio:
Sudip Dosanjh heads the extreme-scale computing group at Sandia which spans architectures, system software, scalable algorithms and disruptive computing technologies. He is also Sandia's Exascale and platforms lead, co-director of the Alliance for Computing at the Extreme Scale (ACES) and the Science Partnership for Extreme-scale Computing (SPEC), and program manager for Sandia's Computer Systems and Software Environments (CSSE) effort under DOE's Advanced Simulation and Computing Program. In partnership with Cray, ACES has developed and is deploying the Cielo Petascale capability platform. He and Jeff Nichols founded the ORNL/Sandia Institute for advanced Architectures and Algorithms. His research interests include computational science, Exascale computing, system simulation and co-design. He has a Ph.D. from U.C. Berkeley in Mechanical Engineering.

How to Take Responsibility for Your Own Computing

Date: Friday, September 2, 2011
Time: 12:00 pm — 12:50 pm
Place: Centennial Engineering Center 1041

Jed Crandall
Assistant Professor, University of New Mexico, Department of Computer Science

It used to be that using a computer on a college campus or in other places was something you didn't really need to think too much about. You worried about backups, antivirus, patches, and such---but most students, faculty, and staff didn't need to worry about being singled out as targets by governments and other organizations who would like to violate our privacy. In this talk I'll try to convince you that individual members of the University community can be singled out as targets by various organizations for different reasons, and tell you what you can do to protect yourself.

I'll also talk some about how computer security research is changing. When the United Nations has summits about "computer security" these days, the discussions are more about content such as blog posts or videos that threaten sovereignty or challenge social norms. Worms and viruses are something that only the Western countries seem to be concerned about. Computer security researchers will still worry about computational games with well-structured rules (ARP spoofing, asymmetric crypto authentication, password entropy, etc.), but increasingly human psychology and motivations mean more on the Internet than RFCs and assembly language do. I'll talk about the opportunities for research that this entails.

Bio:
Jed Crandall is an Assistant Professor and Qforma Lecturer in the UNM Computer Science department. He and his graduate students do research in computer and network security and privacy, including Internet censorship, forensics, privacy, advanced network reconnaissance, and natural language processing.

Resource Usage Analysis and Verification in the CiaoPP System

Date: Thursday, August 25, 2011
Time: 11:00 am — 11:50 am
Place: Centennial Engineering Center Stamm Room (next to the southeast entrance)

Pedro Lopez-Garcia, PhD,
Researcher, IMDEA Software Institute, Madrid, Spain

We present a general resource usage analysis framework which is parametric with respect to resources and type of approximation (lower- and upper-bounds). The user can define the parameters of the analysis for a particular resource by means of assertions that associate basic cost functions with elementary operations of programs, thus expressing how they affect the usage of a particular resource. A global static analysis can then infer bounds on the resource usage of all the procedures in the program, providing such usage bounds as functions of input data sizes. We show how to instantiate such a framework for execution time analysis. Other examples of resources that can be analyzed by instantiating the framework are execution steps, energy consumption, as well as other user-defined resources, like the number of bits sent or received by an application over a socket, number of calls to a procedure, or number of accesses to a database. Based on the general analysis, we also present a framework for (static) verification of general resource usage program properties. The framework extends the criteria of correctness as the conformance of a program to a specification expressing upper and/or lower bounds on resource usages (given as functions on input data sizes). We have defined an abstract semantics for resource usage properties and operations to compare the (approximated) intended semantics of a program (i.e., the specification) with approximated semantics inferred by static analysis. These operations include the comparison of arithmetic functions. A novel aspect of our framework is that the outcome of the static checking of assertions can express intervals for the input data sizes such that a given specification can be proved for some intervals but disproved for others. We have implemented these techniques within the Ciao/CiaoPP system in a natural way, resulting in a framework that unifies static verification and static debugging (as well as run-time verification and unit testing).

Bio:
Pedro Lopez-Garcia, PhD, received a MS degree and a Ph.D. in Computer Science from the Technical University of Madrid (UPM), Spain in 1994 and 2000, respectively. In May 28, 2008 he got a Scientific Researcher position at the Spanish Council for Scientific Research (CSIC) and joined the IMDEA Software Institute. Immediately prior to this position, he held associate and assistant professor positions at UPM and was deputy director of the Artificial Intelligence unit at the Computer Science Department. He has published about 30 refereed scientific papers (50% of them at conferences and journals of high or very high impact.) He has also been coordinator of the international project ES_PASS and participated as a researcher in many other national and international projects. His main areas of interest include automatic analysis and verification of global and complex program properties such as resource usage (user defined, execution time, memory, etc.), non-failure and determinism; performance debugging; (automatic) granularity analysis/control for parallel and distributed computing; profiling; unit-testing; type systems; constraint and logic programming.