The SMURFS Project: Simulation and Modeling for Understanding Resilience and Faults at ScaleSpeaker: Dorian C Arnold – Orange County, CA, United States
Topic(s): Applied Computing
Current HPC research explorations target computer systems with exaflop (10^18 or a quintillion floating point operations per second) capabilities. Such computational power will enable new, important discoveries across all basic science domains. Application resilience is a major challenge to the realization of extreme scale computing systems. The SMURFS Project addresses this challenge by developing methods to improve our predictive understanding of the complex interactions amongst a given application, a given real or hypothetical hardware and software system environment and a given fault-tolerance strategy at extreme scale. Specifically, SMURFS explores: (1) Advanced simulation and modeling capabilities for studying application resilience at scale; (2) Comprehensive, comparative studies of existing and new fault-tolerance strategies; (3) Detailed understandings of how application features interplay with different fault-tolerance strategies and hardware technologies; and (4) Effective prescriptions to guide application developers, hardware architects and system designers to realize efficient, resilient extreme scale capabilities. (This project is a collaboration amongst Emory University, the University of Tennessee and the Sandia National Labs. It is funded in part by the National Science Foundation.)
About this LectureNumber of Slides: 45
Duration: 50 minutes
Languages Available: English
Request this Lecture
To request this particular lecture, please complete this online form.
Request a Tour
To request a tour with this speaker, please complete this online form.
All requests will be sent to ACM headquarters for review.