HPCSYSPROS19

SIGHPC Systems Professionals Workshop

HPCSYSPROS19

FRIDAY November 22, 2019

8:30am to noon

Denver, CO

Held in conjunction with

and in cooperation with

Quick Information

Supercomputing systems present complex challenges to personnel who design, deploy and maintain these systems. Standing up these systems and keeping them running require novel solutions that are unique to high performance computing. The success of any supercomputing center depends on stable and reliable systems, and HPC Systems Professionals are crucial to that success.

The Fourth Annual HPC Systems Professionals Workshop will bring together systems administrators, systems architects, and systems analysts in order to share best practices, discuss cutting-edge technologies, and advance the state-of-the-practice for HPC systems. This CFP requests that participants submit either papers, slide presentations, or 5-minute Lightning Talk proposals along with reproducible artifacts (code segments, test suites, configuration management templates) which can be made available to the community for use.

Keynote Speaker

Kate Keahey will be presenting Chameleon: How to Build a Cloud++

Chameleon is a large-scale, deeply reconfigurable experimental platform built to support Computer Sciences systems research. Community projects range from systems research developing exascale operating systems, virtualization methods, performance variability studies, and power management research to projects in software defined networking, machine learning, and resource management. What makes Chameleon unique is that it provides these sophisticated capabilities based on a mainstream infrastructure cloud system (OpenStack). In this talk, I will explain the challenges we faced in building Chameleon, lessons learned, operations experiences, and describe our packaging of the system that integrates both the developed capabilities and the operational experience and facilitates managing platforms of this kind.

Kate Keahey is one of the pioneers of infrastructure cloud computing. She created the Nimbus project, recognized as the first open source Infrastructure-as-a-Service implementation, and continues to work on research aligning cloud computing concepts with the needs of scientific centers and applications. To facilitate such research for the community at large, Kate leads the Chameleon project, providing a deeply reconfigurable, large-scale, and open experimental platform for Computer Science research. Kate also co-founded and serves as co-Editor-in-Chief of the SoftwareX journal, a new format designed to publish software contributions. Kate is a Senior Computer Scientist with the Math and Computer Science division at Argonne National Laboratory and a Senior Fellow at the Computation Institute at the University of Chicago.

Schedule

Start	End	Description
8:30 AM	8:45 AM	Welcome
8:45 AM	9:30 AM	Keynote: Chameleon: How to build a Cloud++, Kate Keahey, Argonne National Laboratory
9:30 AM	9:45 AM	Paper: Decoupling OpenHPC Critical Services, Jacob Chappell, Bhushan Chitre, Vikram Gazula, Lowell Pike, James Griffioen, University of Kentucky
9:45 AM	10:00 AM	Paper: Implementing a Common HPC Environment in a Multi-User Spack Instance, Carson Woods, Matthew L. Curry, Anthony Skjellum, University of Tennessee, Chattanooga and Sandia National Laboratories
10:00 AM	10:30 AM	Break
10:30 AM	10:37 AM	Lightning Talk: Arbiter: Dynamically Limiting Resource Consumption on Login Nodes, Dylan Gardner, Robben Migacz, Brian Haymore, University of Utah, Center for High Performance Computing
10:37 AM	10:44 AM	Lightning Talk: Using GUFI in Data Management, Christopher Hoffman, Bill Anderson, National Center for Atmospheric Research
10:44 AM	11:00 AM	Slide Presentation: Monitoring HPC Services with CheckMK, Kieran Leach, Philip Cass, Craig Manzi, Edinburgh Parallel Computing Centre
11:00 AM	11:15 AM	Slide Presentation: The Road to Devops HPC Cluster Management, Ken P. Schmidt, Evan J. Felix, Pacific Northwest National Laboratory
11:15 AM	11:30 AM	Paper: What Deploying MFA Taught Us About Changing Infrastructure, Abe Singer, Shane Canon, Rebecca Hartman-Baker, Kelly L. Rowland, David Skinner, Craig Lant, Lawrence Berkeley National Laboratory and National Energy Research Scientific Computing Center
11:30 AM	11:45 AM	Paper: A Better Way of Scheduling Jobs on HPC Systems: Simultaneous Fair-Share, Craig P. Steffen, University of Illinois and National Center for Supercomputing Applications
11:45 AM	12:00 AM	Closing Remarks and Open Discussion

Proceedings

https://github.com/HPCSYSPROS/Workshop19

Topics of Interest

Here are some topics of interest for this group. Note that these are here to indicate direction, not to disallow other related topics.

Cluster, configuration, or software management
Performance tuning/Benchmarking
Resource manager and job scheduler configuration
Monitoring/Mean-time-to-failure/ROI/Resource utilization
Virtualization/Clouds
Designing and troubleshooting HPC interconnects
Designing and maintaining HPC storage solutions
Cybersecurity and data protection
Cluster storage

Example paper ideas might be:

Best practices for job scheduler configuration
Advantages of cluster automation
Managing software on HPC clusters

Calendar

Event	Date
Workshop Submissions Open	April 17, 2019
Workshop Submission Close	August 30, 2019
Reviews Sent	September 14, 2019
Resubmission Open	September 16, 2019
Resubmission Closed	September 28, 2019
Acceptance Notification	October 11, 2019

Organizing Committee

Position	Name	Affiliation
Chair	David Clifton	Ansys
Program Committee Chair	John Blaas	University of Colorado Boulder
Organizing Committee
	Stephen Fralich	Boeing
	Stephen Lien Harrell	Purdue University
	Adam Hough	University of Washington
	William Scullin	Laboratory for Laser Energetics
	Jenett Tillotson	NCAR

Program Committee

Name	Affiliation
Jonathon Anderson	University of Colorado Boulder
Jared Baker	NCAR
Matt Bidwell	NREL
Stephen Fralich	Boeing
Brian Haymore	University of Utah
Randy Herban	Sylabs Inc.
Andrew Howard	Microsoft
Ti Leggett	Argonne National Laboratory
Hon Wai Leong	DDN
Scott McMillian	Nvidia
Paul Peltz Jr.	Oak Ridge National Laboratory
Jeff Raymond	University of Pittsburgh
Jenett Tillotson	NCAR
Alex Younts	Purdue University

Publication Information

All accepted papers and artifacts will be published on GitHub and archived with a DOI in Zenodo. You can view last years accepted papers here HPCSYSPROS SC18 Workshop Proceedings

Contact Information

If you need to contact us, send email to SIGHPC SYSPROS.

SIGHPC Systems Professionals Workshop

HPCSYSPROS19

FRIDAY November 22, 2019

8:30am to noon

Denver, CO

Held in conjunction with

and in cooperation with

Quick Information

Keynote Speaker

Schedule

Proceedings

Topics of Interest

Calendar

Organizing Committee

Program Committee

Publication Information

Contact Information

Links