Scalable Data Services for Petascale Applications

Providing high-performance I/O for data-intensive high-end computing
applications requires working with I/O systems at an unproductively
low level of abstraction. This project seeks to provide higher-level
I/O abstractions which make possible complex I/O tasks. The central
abstraction of this approach is the structured stream. Structured
streams provide a model for embedding application-specific
functionality between application components. This functionality is
applied to data as it moves through I/O graphs, which perform routing
and in-band modification. The metadata necessary to connect these
components is constructed out-of-band by autonomous
metabots,
moving the performance impact of metadata maintenance out of the "fast
path" of high-end computing applications.
publications
- P. Widener, M. Wolf, H. Abbasi, M. Barrick, J. Lofstead,
J. Pulikottil, G. Eisenhauer, A. Gavrilovska, S. Klasky, R. Oldfield,
et al. Structured Streams: Data Services for Petascale Science
Environments. Technical report, University of New Mexico, 2007. UNM
Technical Report TR-CS-2007-17.
- Patrick M. Widener, Matthew Barrick, Jack Pullikottil, Patrick
G. Bridges and Arthur B. Maccabe. "Metabots: A Framework for
Out-of-Band Processing in Large-Scale Data Flows". Poster in Proc. 2007
International Conference on Grid Computing (Grid 2007), Austin, Texas,
September, 2007.
Scalable Proactive Control Planes

We argue that traditional passive client interfaces to directory
services are not sufficient for the application environments enabled by
grid and pervasive computing, where data are updated at high
frequency. In particular, an exclusively passive interface not only
hinders service scalability but also indirectly restricts the behavior
of potential applications. Consequently, we have proposed a
customizable active mode through which clients can subscribe to be
notified of changes to data of interest. We have designed and
implemented the Proactive Directory Service to test our ideas. PDS
clients can dynamically tune the levels of detail and granularity of
these notifications through filter functions instantiated at the server
or at the object's owner, and by remotely tuning the functionality of
those filters. We are currently building a next generation of these
tools as part of the Petascale Storage project.
publications
- Zhongtang Cai, Vibhore Kumar, Karsten Schwan, Brian F. Cooper,
Greg Eisenhauer, Mohamed Mansour, Balasubramanian Seshasayee and
Patrick Widener. "Implementing Diverse Messaging Models with
Self-Managing Properties using IFLOW". In Proc. 3rd IEEE International
Conference on Autonomic Computing, Dublin, Ireland, June, 2006.
- Fabian Bustamante, Patrick Widener and Karsten Schwan. "Scalable
Directory Services Using Proactivity". In Proc. Supercomputing 2002,
Baltimore, Maryland, November, 2002.
Middleware and Control
Structures for Sensor Networks

Currently, users of sensor network applications must adopt custom
methods and interfaces to perform common management tasks. This makes
monitoring, tasking, diagnosing and debugging sensor networks and
their applications cumbersome; for example, users often cannot
transfer skills learned for one application onto another. Our work
provides standardized end-to-end (from user to sensor nodes)
communication and control over a sensor network. A POSIX-style
filesystem interface enables users to view and update data, organize
groups of sensors, and retask sensor nodes. Sensor nodes appear as
directories containing sensor and data files. Users are then able to
use common command-line utilities to interact with the sensor
network. We are currently testing the fidelity of our approach by
applying management tools such as file system visualizers to our work.
publications
- James Horey, Jean-Charles Tournier, Patrick Widener and Arthur
Maccabe. "Koseki: A Sensor Network Filesystem". Poster in Proc. 2007 Annual
International Conference on Mobile Systems (MobiSys), San Juan, Puerto
Rico, June, 2007.
Dynamic Differential Data Protection

A key concern among developers of extensible systems is the ability to
provide adaptation approaches without sacrificing the level of security
achievable with more static (and less adaptable) solutions. For
example, where will adaptations execute? With what environment will
they run? What level of access will they have to the existing system?
Dynamic Differential Data Protection (D3P) has addressed these issues
through the creation and evaluation of protection mechanisms for
middleware infrastructures. D3P provides control over the data typing
space for such middleware as well as abstractions provided by the
middleware itself. D3P also provides a general and flexible
extension/customization model for distributed applications based on
publish/subscribe middleware. We are now exploring how D3P concepts
can be applied to advanced software architectures for high-performance
computing.
publications
- Jiantao Kong, Ivan Ganev, Karsten Schwan and Patrick
Widener. "CameraCast: Flexible Access to Remote Video Sensors". In
Proc. Fourteenth Annual Multimedia Computing and Networking Conference
(MMCN'07), San Jose, California, January, 2007.
- Patrick Widener. "Reverb: Middleware Support for Distributed
Application Forensics". In Proc. IEEE Workshop on Challenges for Large
Distributed Environments, Research Triangle Park, North Carolina,
July, 2005.
- Patrick Widener, Karsten Schwan and Fabian
Bustamante. "Differential Data Protection in Dynamic Distributed
Applications". In Proc. Annual Computer Security Applications
Conference, Las Vegas, Nevada, December, 2003.
Lightweight Storage for High-Performance Computing

Today's high-end massively parallel processing machines have thousands
to tens of thousands of processors, with next-generation systems
planned to have in excess of one hundred thousand processors. For
systems of such scale, efficient I/O is a significant challenge that
cannot be solved using traditional approaches. In particular, general
purpose parallel file systems that limit applications to standard
interfaces and access policies do not scale and are a performance
bottleneck for many scientific applications. This project investigates
the use of a "lightweight" approach to I/O that requires the
application or I/O-library developer to extend a core set of critical
I/O functionality with the minimum set of features and services
required by its target applications. We argue that this approach
allows the development of I/O libraries that are both scalable and
secure.
publications
- Ron A. Oldfield, Arthur B. Maccabe, Sarala Arunagiri, Todd
Kordenbrock, Rolf Riesen, Lee Ward and Patrick Widener. "Lightweight
I/O for Scientific Applications". In Proc. 2006 IEEE Conference on
Cluster Computing, Barcelona, Spain, September, 2006.
High-Performance Structured
Data Exchange
The EVPath communication infrastructure, along with FFS, its companion
data representation library, is the foundation for many research
efforts in the systems group at Georgia Tech. I continue to collaborate
with researchers there on topics ranging from wire-formats for
heterogeneous communication, to peer interaction paradigms (pull/push
styles of communication), to required services and driving
applications.
publications
- Patrick Widener, Greg Eisenhauer, Karsten Schwan and Fabian
E. Bustamante. 'Open Metadata Formats: Efficient XML-Based
Communication for High-Performance Computing'. In Cluster Computing:
The Journal of Networks, Software Tools, and Applications (5), 2002,
pp.315-324.
- Fabian Bustamante, Greg Eisenhauer, Karsten Schwan and Patrick
Widener. "Efficient Wire Formats for High-Performance Computing". In
Proc. Supercomputing 2000, Dallas, Texas, November, 2000.