Hobbes Exascale Operating and Runtime System

Overview

Hobbes OS Architecture

Hobbes is a new operating and runtime system for next-generation exascale supercomputers being developed by at team including UNM and led by Sandia National Laboratories. In contrast to previous HPC operating systems, Hobbes explicitly supports application composition, which is emerging as a key approach for applications to address scalability and power concerns anticipated with coming extreme-scale architectures. Hobbes also makes use of virtualization technologies to provide the flexibility to support requirements of application components for different node-level operating systems and runtimes, as well as different mappings of the components onto the hardware.

Cooperative and Coordinated Scheduling

UNM, together with collaborators at Oak Ridge and Georgia Tech, is exploring new scheduling techniques to support efficient execution of multiple application enclaves (application components) running on large-scale supercomputers. In particular, we are examining both coordinated (gang) scheduling and cooperating scheduling techniques based on real-time schedulers. Our goal with this research is to enable multiple application components, for example simulation and analytics components running in different virtualized operating systems, to effectively and efficiently share system resources.

Multi-Enclave Benchmarks

To drive research in Hobbes on support for multi-enclave applications, UNM, Sandia, Oak Ridge, Los Alamos, and Georgia Tech are assembling simple multi-enclave benchmarks. These benchmarks center on streaming data from well-known applications and benchmarks to associated analytics programs. We are using the LAMMPS application from Sandia, GTC-P proxy application from ORNL, and SNAP miniapp from LANL as the simulations for this research. In general, the simulation code runs directly on the Hobbes Hardware Abstraction Layer, while the analytics runs in a virtualized Linux operating system. These enclaves communicate over high-perfomrance intra-node communication systems based on ADIOS and xpmem.

Acknowledgements

Research and development of Hobbes is funded from the U.S. Department of Energy Office of Science, Advanced Scientific Computing Research, program manager Sonia Sachs. Research on the Palacios virtualization layer used in Hobbes was funded by the U.S. Department of Energy Office of Science, Advanced Scientific Computing Research, under award number DE-SC0005050, program manager Sonia Sachs, and from grant CNS-0709168 and CNS-0707365 from the National Science Foundation.

Publications

Loading publications...