SC23 full-day tutorial: Hands-on Practical Hybrid Parallel Application Performance Engineering (Denver, CO, USA)
Date
Monday 13th November 2023
Presenters
- Markus Geimer, Jülich Supercomputing Centre
- Sameer Shende, University of Oregon
- Bert Wesarg, Technische Universität Dresden
- Brian Wylie, Jülich Supercomputing Centre
Logistics
This page will be updated as information becomes available, so check back before traveling to attend the tutorial. Tutorials are planned to be live-streamed as part of the SC23 Digital Experience, however, remote participants will not receive assistance for hands-on parts. The currently available software and exercises are being updated in preparation for the tutorial.
The full-day hands-on tutorial takes place as part of the SC23 conference scheduled in room 402 of the Colorado Convention Center, Denver, CO, USA. Registration via the conference website is possible for the tutorial with or without including the conference technical program, exhibition and workshops.
Hands-on exercises will use accounts provided by Jülich Supercomputing Centre (JSC) on the JUWELS-Booster modular supercomputer to build and run an MPI+CUDA example code on two compute nodes each with dual AMD EPYC 7402 24-core 'Rome' CPUs and quad Nvidia A100 'Ampere' GPUs, measuring and analysing intra-node and inter-node performance with VI-HPS tools. Access will be via the Jupyter-JSC service allowing an Xpra remote graphical desktop environment to run within common web browsers. Tutorial participants are expected to use their own notebook computers, connecting via the SC conference wireless network, but no additional software needs to be installed.
Tutorial participants are strongly encouraged to (pre)register for a JUDOOR account to access the training project and its allocation on JUWELS-Booster.
(Note that the SC23 tutorial on Distributed GPU Programming which will also use this system is scheduled to run concurrently and will use a different training project.)
Abstract
This tutorial presents state-of-the-art performance tools for leading-edge HPC systems founded on the community-developed Score-P instrumentation and measurement infrastructure, demonstrating how they can be used for performance engineering of effective scientific applications based on standard MPI, OpenMP, hybrid MPI+OpenMP, and increasingly common usage of accelerators. Parallel performance evaluation tools from the VI-HPS (Virtual Institute - High Productivity Supercomputing) are introduced and featured in hands-on exercises with Scalasca, Vampir and TAU. We present the complete workflow of performance engineering, including instrumentation, measurement (profiling and tracing, timing and PAPI hardware counters), data storage, analysis, and visualization. Emphasis is placed on how tools are used in combination for identifying performance problems and investigating optimization alternatives. Using their own notebook computers participants will conduct exercises on quad-A100 GPU nodes of the Programme (tentative)
08:30
Introduction & basic measurement
10:00
(break)
10:30
Profile analyses
12:00
(lunch)
13:30
Trace analyses
15:00
(break)
15:30
Further steps
17:00
(adjourn)