SIGHPC Systems Professionals Workshop

HPCSYSPROS20

Friday November 13, 2020

10 AM to 2 PM EST

Atlanta, GA (virtual)

Held in conjunction with

SC20 Logo

and in cooperation with

SIGHPC Logo

 Quick Information

Supercomputing systems present complex challenges to personnel who design, deploy and maintain these systems. Standing up these systems and keeping them running require novel solutions that are unique to high performance computing. The success of any supercomputing center depends on stable and reliable systems, and HPC Systems Professionals are crucial to that success.

The Fifth Annual HPC Systems Professionals Workshop will bring together systems administrators, systems architects, and systems analysts in order to share best practices, discuss cutting-edge technologies, and advance the state-of-the-practice for HPC systems. This CFP requests that participants submit either papers, slide presentations, or 5-minute Lightning Talk proposals along with reproducible artifacts (code segments, test suites, configuration management templates) which can be made available to the community for use.

 Keynote Speaker

Atsuya Uno will be presenting Introduction of Supercomputer Fugaku

Fugaku is the most powerful supercomputer currently deployed ranking #1 on the TOP500 list as of June 2020. Powered by Fujitsu’s 48-core A64FX SoC, becoming the first number one system on the list to be powered by ARM processors. In single or further reduced precision, which are often used in machine learning and AI applications, Fugaku’s peak performance is over 1,000 petaflops (1 exaflops). The new system is installed at RIKEN Center for Computational Science (R-CCS) in Kobe, Japan.

 Schedule

All times in Eastern Standard Time

StartEndDescription
10:00 AM10:05 AMWelcome
10:05 AM10:40 AM Keynote: Introduction of Supercomputer Fugaku, Atsuya Uno, Riken Center for Computational Science
10:40 AM10:50 AM Site Report: NVIDIA, Adam DeConinck, NVIDIA
10:50 AM11:00 AM Site Report: MARCC, Jaime Combariza, Johns Hopkins University
11:00 AM11:10 AM Site Report: NREL, Matt Bidwell, National Renewable Energy Laboratory
11:10 AM11:20 AM Site Report: INL, Ben Nickell, Idaho National Laboratory
11:20 AM11:25 AM Lightning Talk: Setup and management of a small national computational facility: What we’ve learned the first 10 years, George Tsouloupas, Thekla Loizou, Panayiotis Vorkas, The Cyprus Institute
11:25 AM11:30 AM Lightning Talk: Case Study of TCP/IP tunings for High Performance Interconnects, Jenett Tillotson , National Center for Atmospheric Research
11:30 AM11:35 AM Lightning Talk: NGC Container Environment Modules, Scott McMillan , NVIDIA
11:35 AM11:50 AM Break
11:50 AM12:10 PM Paper: Application Performance in the Frontera Acceptance Process, Richard Todd Evans, Texas Advanced Computing Center, University of Texas at Austin
12:10 PM12:30 PM Paper: Parallelized Data Replication of Multi-Petabyte Storage Systems, Honwai Leong, Daniel Richards, Andrew Jankee, Stephen Kolmann, Data Direct Networks and The University of Sydney
12:30 PM12:50 PM Paper: Log-Based Identification, Classification, and Behavior Prediction of HPC Applications, Ryan D. Lewis, Zhengchun Liu, Rajkumar Kettimuthu, Michael E. Papka, Northern Illinois University and Argonne National Laboratory
12:50 PM1:10 PM Paper: Modernizing the HPC System Software Stack, Benjamin S. Allen, Matthew A. Ezell, Paul Peltz, Douglas Jacobsen, Cory Lueninghoener, J. Lowell Wofford, Eric Roman, Argonne National Laboratory, Oak Ridge National Laboratory, Lawrence Berkeley National Laboratory, Los Alamos National Laboratory,
1:10 PM1:50 PM Panel: Cluster Management, TBA
1:50 PM1:55 PM Traxler Family Award for Community Service
1:55 PM2:00 PM Closing Remarks

 Topics of Interest

Here are some topics of interest for this group. Note that these are here to indicate direction, not to disallow other related topics.

  • Cluster, configuration, or software management
  • Performance tuning/Benchmarking
  • Resource manager and job scheduler configuration
  • Monitoring/Mean-time-to-failure/ROI/Resource utilization
  • Virtualization/Clouds
  • Designing and troubleshooting HPC interconnects
  • Designing and maintaining HPC storage solutions
  • Cybersecurity and data protection
  • Cluster storage

Example paper ideas might be:

  • Best practices for job scheduler configuration
  • Advantages of cluster automation
  • Managing software on HPC clusters

 Calendar

EventDate
Workshop Submissions OpenApril 29, 2020
Workshop Submission Close September 9, 2020
Reviews SentSeptember 18, 2020
Acceptance NotificationOctober 2, 2020

 Organizing Committee

PositionNameAffiliation
Workshop ChairMatt BidwellNational Renewable Energy Laboratory
Program ChairGary Jackson Johns Hopkins University Applied Physics Laboratory
Organizing Committee
John BlaasNational Center for Atmospheric Research
David Clifton Ansys
Stephen Lien HarrellTexas Advanced Comptuing Center
John LegatoNational Institutes of Health
Kurt MaierPacific Northwest National Laboratory
William ScullinLaboratory for Laser Energetics
Jenett TillotsonNational Center for Atmospheric Research

 Program Committee

NameAffiliation
Jonathon AndersonUniversity of Colorado Boulder
Adam DeConinckNVIDIA
Violeta HolmesUniversity of Huddersfield
Gary JacksonJohn Hopkins University, Applied Physics Laboratory
John LegatoNational Institutes of Health
Ti LeggettArgonne National Laboratory
Hon Wai LeongDDN
Scott McMillianNVIDIA
Ken SchmidtPacific Northwest National Laboratory
William Scullin University of Rochester, Laboratory for Laser Energetics
Jenett TillotsonNational Center for Atmospheric Research

 Publication Information

All accepted papers and artifacts will be published on GitHub and archived with a DOI in Zenodo. You can view last years accepted papers here HPCSYSPROS SC19 Workshop Proceedings

 Contact Information

If you need to contact us, send email to SIGHPC SYSPROS.

 Links