Date
Sunday, November 13, 2016
14:00-17:30
Location
Held in conjunction with SC16: The International Conference for High Performance Computing, Networking, Storage and Analysis
Salt Lake City Conversion Center, Room 155-E
Salt Lake City, Utah, USA
In cooperation with:
Description
The path to exascale computing will challenge HPC application developers in
their quest to achieve the maximum potential that the machines have to
offer. Factors such as limited power budgets, heterogeneity, hierarchical
memories, shrinking I/O bandwidths, and performance variability will make
it increasingly difficult to create productive applications on future
platforms. Tools for debugging, performance measurement and analysis, and
tuning will be needed to overcome the architectural, system, and
programming complexities envisioned in these exascale environments. At the
same time, research and development progress for HPC tools faces equally
difficult challenges from exascale factors. Increased emphasis on
autotuning, dynamic monitoring and adaptation, heterogeneous analysis, and
so on will require new methodologies, techniques, and engagement with
application teams. This workshop will serve as a forum for HPC application
developers, system designers, and tools researchers to discuss the
requirements for exascale-ready/exascale-enabled tools and the roadblocks
that need to be addressed.
The workshop is the fifth in a series of SC conference workshops
organized by the Virtual Institute - High Productivity Supercomputing
(VI-HPS), an international initiative of HPC researchers and developers
focused on parallel programming and performance tools for large-scale
systems. The ESPT-2016 proceedings are in the IEEE Digital Library.
Workshop Program
14:00 – 14:05 |
Welcome and introduction by Allen Malony |
14:05 – 14:40 |
Keynote presentation: "Exascale Application Drivers for Software Technologies" by Doug Kothe
|
14:40 – 15:05 |
"Methodology and Application of HPC I/O Characterization with MPIProf and IOT" by Yan-Tyng Sherry Chang, Henry Jin and John Bauer
(show abstract)
Combining the strengths of MPIProf and IOT, an efficient and systematic method is devised for I/O characterization at the per-job, per-rank, per-file and per-call levels of programs running on the high-performance computing resources at the NASA Advanced Supercomputing (NAS) facility. This method is applied to four I/O questions in this paper. A total of 13 MPI programs and 15 cases, ranging from 24 to 5968 ranks, are analyzed to establish the I/O landscape from answers to the four questions. Four of the 13 programs use MPI I/O, and the behavior of their collective writes depends on the specific implementation of the MPI library used. The SGI MPT library, the prevailing MPI library for NAS systems, was found to automatically gather small writes from a large number of ranks in order to perform larger writes by a small subset of collective buffering ranks. The number of collective buffering ranks invoked by MPT depends on the Lustre stripe count and the number of nodes used for the run. A demonstration of varying the stripe count to achieve double-digit speedup of one program's I/O was presented. Another program, which concurrently opens private files by all ranks and could potentially create a heavy load on the Lustre servers, was identified. The ability to systematically characterize I/O for a large number of programs running on a supercomputer, seek I/O optimization opportunity, and identify programs that could cause a high load and instability on the filesystems is important for pursuing exascale in a real production environment. |
15:05 – 15:30 |
Coffee break |
15:30 – 15:55 |
"Modular HPC I/O Characterization with Darshan" by Shane Snyder, Philip Carns, Kevin Harms, Robert Ross, Glenn Lockwood and Nicholas Wright
(show abstract)
Contemporary high-performance computing (HPC) applications encompass a broad range of distinct I/O strategies and are often executed on a number of different compute platforms in their lifetime. These large-scale HPC platforms employ increasingly complex I/O subsystems to provide a suitable level of I/O performance to applications. Tuning I/O workloads for such a system is nontrivial, and the results generally are not portable to other HPC systems. I/O profiling tools can help to address this challenge, but most existing tools only instrument specific components within the I/O subsystem that provide a limited perspective on I/O performance. The increasing diversity of scientific applications and computing platforms calls for greater flexibility and scope in I/O characterization.
In this work, we consider how the I/O profiling tool Darshan can be improved to allow for more flexible, comprehensive instrumentation of current and future HPC I/O workloads. We evaluate the performance and scalability of our design to ensure that it is lightweight enough for full-time deployment on production HPC systems. We also present two case studies illustrating how a more comprehensive instrumentation of application I/O workloads can enable insights into I/O behavior that were not previously possible. Our results indicate that Darshan’s modular instrumentation methods can provide valuable feedback to both users and system administrators, while imposing negligible overheads on user applications. |
15:55 – 16:20 |
"Floating-Point Shadow Value Analysis" by Michael Lam and Barry Rountree
(show abstract)
Real-valued arithmetic has a fundamental impact on the performance and accuracy of scientific computation. As scientific application developers prepare their applications for exascale computing, many are investigating the possibility of using either lower precision (for better performance) or higher precision (for more accuracy). However, exploring alternative representations often requires significant code revision. We present a novel program analysis technique that emulates execution with alternative real number implementations at the binary level. We also present a Pin-based implementation of this technique that supports supports x86 64 programs and a variety of alternative representations. |
16:20 – 16:45 |
"Runtime Verification of Scientific Computing: Towards an Extreme Scale"
by Minh Ngoc Dinh, Chao Jin, David Abramson and Clinton Jeffery
(show abstract)
Relative debugging helps trace software errors by comparing two concurrent executions of a program -- one code being a reference version and the other faulty. By locating data divergence between the runs, relative debugging is effective at finding coding errors when a program is scaled up to solve larger problem sizes or migrated from one platform to another. In this work, we envision potential changes to our current relative debugging scheme in order to address exascale factors such as the increase of faults and the non-deterministic outputs. First, we propose a statistical-based comparison scheme to support verifying results that are stochastic. Second, we leverage a scalable data reduction network to adapt to the complex network hierarchy of an exascale system, and extend our debugger to support the statistical-based comparison in an environment subject to failures.
|
16:45 – 17:10 |
"Automatic Code Generation and Data Management for an Asynchronous Task-based Runtime"
by Muthu Baskaran, Benoit Pradelle, Benoit Meister, Athanasios Konstantinidis and Richard Lethin
(show abstract)
Hardware scaling and low-power considerations associated with the quest for exascale and extreme scale computing are driving system designers to consider new runtime and execution models such as the event-driven-task (EDT) models that enable more concurrency and reduce the amount of synchronization. Further, for performance, productivity, and code sustainability reasons, there is an increasing demand for auto-parallelizing compiler technologies to automatically produce code for EDT-based runtimes. However achieving scalable performance in extreme-scale systems with auto-generated codes is a non-trivial challenge. Some of the key requirements that are important for achieving good scalable performance across many EDT-based systems are: (1) scalable dynamic creation of task-dependence graph and spawning of tasks, (2) scalable creation and management of data and communications, and (3) dynamic scheduling of tasks and movement of data for scalable asynchronous execution. In this paper, we develop capabilities within R-Stream -- an automatic source-to-source optimization compiler -- for automatic generation and optimization of code and data management targeted towards Open Community Runtime (OCR) -- an exascale-ready asynchronous task-based runtime. We demonstrate the effectiveness of our techniques through performance improvements on various benchmarks and proxy application kernels that are relevant to the extreme-scale computing community.
|
17:10 – 17:35 |
"A Scalable Observation System for Introspection and In Situ Analytics"
by Chad Wood, Sudhanshu Sane, Daniel Ellsworth, Alfredo Gimenez, Kevin Huck, Todd Gamblin and Allen Malony
(show abstract)
SOS is a new model for the online in situ characterization and analysis of complex high-performance computing applications. SOS employs a data framework with distributed information management and structured query and access capabilities. The primary design objectives of SOS are flexibility, scalability, and programmability. SOS provides a complete framework that can be configured with and used directly by an application, allowing for a detailed workflow analysis of scientific applications. This paper describes the model of SOS and the experiments used to validate and explore the performance characteristics of its implementation in SOSflow. Experimental results demonstrate that SOS is capable of observation, introspection, feedback and control of complex high-performance applications, and that it has desirable scaling properties.
|
Organizing committee
Allen D. Malony, University of Oregon, USA
Martin Schulz, Lawrence Livermore National Laboratory, USA
Felix Wolf, TU Darmstadt, Germany
William Jalby, Université de Versailles St-Quentin-en-Yvelines, France
Program committee
Luiz DeRose, Cray Inc., USA
Michael Gerndt, Technische Universität München, Germany
Jeffrey K. Hollingsworth, University of Maryland, USA
William Jalby, Université de Versailles St-Quentin-en-Yvelines, France
Andreas Knüpfer, Technische Universität Dresden, Germany
David Lecomber, Allinea Software, UK
Allen D. Malony, University of Oregon, USA
John Mellor-Crummey, Rice University, USA
Martin Schulz, Lawrence Livermore National Laboratory, USA
Sameer Shende, University of Oregon, USA
Felix Wolf, Technische Universität Darmstadt, Germany
Brian Wylie, Jülich Supercomputing Centre, Germany
Previous workshops
- Extreme-Scale Programming Tools (16 November 2015, Austin, TX, USA)
- Extreme-Scale Programming Tools (17 November 2014, New Orleans, LA, USA)
- Extreme-Scale Programming Tools (18 November 2013, Denver, CO, USA)
- Extreme-Scale Performance Tools (16 November 2012, Salt Lake City, UT, USA)
Contact
Allen D. Malony (Email malony@cs.uoregon.edu, phone +1-541-346-4407)