Moving towards exascale via optimization for application- and hardware- aware execution
Speaker
Laura Carrington
Director of the Performance, Modeling, and Characterization (PMaC) Lab
University of California San Diego
San Diego Supercomputing Center
La Jolla, CA 92093-0505
Abstract
Over the past decade the focused has been on developing tools to analyze and improve performance. However, the recent explosion in the number of components in HPC systems has brought two more key areas to the forefront -- energy and resiliency. We can no longer afford to aimlessly grow the size of systems by simply adding more and more components since the cost of powering such systems rapidly becomes impractical. In addition, as applications attempt to run at these larger scales the reliability of the millions of components involved becomes a dominant issue. As we move towards exascale systems, the sheer scale of the number of components as well as the deep memory hierarchies and complex interconnect subsystems will mean that traditional performance problems will need to be rethought in terms of a multi-objective optimization problem: understanding the trade-offs between performance, energy efficiency, and resiliency. Next generation tools should address this challenge by allowing us analyze and model all the interactions between the application and the hardware to fully understand these trade-offs. The GreenQueue framework begins to address these issues by providing the capability to model power and performance, resulting in customized hardware settings to optimize for energy efficiency. Still, the issue of resiliency looms large. Fully understanding and reacting to resiliency-related problems in order to get a handle on the multi- objective optimization problem will require better component diagnostic information from the hardware/software stack.
Laura Carrington
Dr. Carrington is an expert in High Performance Computing. Her work has resulted in over 40 publication in HPC benchmarking, workload analysis, application performance modeling, analysis of accelerators (i.e. FPGAs and GPUs) for scientific workloads, tools in performance analysis (i.e. processor and network simulators), and energy-efficient computing. At UCSD, she is the director of the Performance, Modeling, and Characterization (PMaC) Lab. She is also the PI for Institute for Sustained Performance, Energy, and Resilience (SUPER) DoE SciDAC-3 and lead for the energy efficiency thrust for the institute as well as PI on a number other awards that support the lab. She has presented at numerous invited talks, member of various panels and committees, and an active member of DoD HPCMP Performance team involved in their annual HPC system procurement for past 10 years.