Fault Tolerance and Robustness

Robust-first Computing Efficiency costs robustness. For the safety of society and to let us build really big computers, we should put robustness first, ahead even of strict correctness and maximum efficiency, robust-first computing embodies this across the entire computational stack.
NM Investigators: David Ackley and Lance Williams

Robust Communication and Computation Secure and robust multiparty computations or communication in networks with adversarial nodes is important to large scale systems. This work addresses resource-efficient and cost-competitive algorithms in these contexts.
UNM Investigator: Jared Sala
 Drexel U., U. of Michigan, U. of Victona

Fault-tolerance for HPC Systems To address the challenges of running applications on next- generation, large-scale, error-prone systems, we use modeling simulation and real frameworks to understand the impart of different resilience mechanisms or application performance.
UNM Investigators:
 Dorian Arnold
 Sandia National Labs