Quick Information
Supercomputing systems present complex challenges to personnel who design, deploy and maintain these systems. Standing up these systems and keeping them running require novel solutions that are unique to high performance computing. The success of any supercomputing center depends on stable and reliable systems, and HPC Systems Professionals are crucial to that success.
The Fourth Annual HPC Systems Professionals Workshop will bring together systems administrators, systems architects, and systems analysts in order to share best practices, discuss cutting-edge technologies, and advance the state-of-the-practice for HPC systems. This CFP requests that participants submit either papers, slide presentations, or 5-minute Lightning Talk proposals along with reproducible artifacts (code segments, test suites, configuration management templates) which can be made available to the community for use.
Keynote Speaker
Kate Keahey will be presenting Chameleon: How to Build a Cloud++
Chameleon is a large-scale, deeply reconfigurable experimental platform built to support Computer Sciences systems research. Community projects range from systems research developing exascale operating systems, virtualization methods, performance variability studies, and power management research to projects in software defined networking, machine learning, and resource management. What makes Chameleon unique is that it provides these sophisticated capabilities based on a mainstream infrastructure cloud system (OpenStack). In this talk, I will explain the challenges we faced in building Chameleon, lessons learned, operations experiences, and describe our packaging of the system that integrates both the developed capabilities and the operational experience and facilitates managing platforms of this kind.
Kate Keahey is one of the pioneers of infrastructure cloud computing. She created the Nimbus project, recognized as the first open source Infrastructure-as-a-Service implementation, and continues to work on research aligning cloud computing concepts with the needs of scientific centers and applications. To facilitate such research for the community at large, Kate leads the Chameleon project, providing a deeply reconfigurable, large-scale, and open experimental platform for Computer Science research. Kate also co-founded and serves as co-Editor-in-Chief of the SoftwareX journal, a new format designed to publish software contributions. Kate is a Senior Computer Scientist with the Math and Computer Science division at Argonne National Laboratory and a Senior Fellow at the Computation Institute at the University of Chicago.
Schedule
Start | End | Description |
---|---|---|
8:30 AM | 8:45 AM | Welcome |
8:45 AM | 9:30 AM | Keynote: Chameleon: How to build a Cloud++, Kate Keahey, Argonne National Laboratory |
9:30 AM | 9:45 AM | Paper: Decoupling OpenHPC Critical Services, Jacob Chappell, Bhushan Chitre, Vikram Gazula, Lowell Pike, James Griffioen, University of Kentucky |
9:45 AM | 10:00 AM | Paper: Implementing a Common HPC Environment in a Multi-User Spack Instance, Carson Woods, Matthew L. Curry, Anthony Skjellum, University of Tennessee, Chattanooga and Sandia National Laboratories |
10:00 AM | 10:30 AM | Break |
10:30 AM | 10:37 AM | Lightning Talk: Arbiter: Dynamically Limiting Resource Consumption on Login Nodes, Dylan Gardner, Robben Migacz, Brian Haymore, University of Utah, Center for High Performance Computing |
10:37 AM | 10:44 AM | Lightning Talk: Using GUFI in Data Management, Christopher Hoffman, Bill Anderson, National Center for Atmospheric Research |
10:44 AM | 11:00 AM | Slide Presentation: Monitoring HPC Services with CheckMK, Kieran Leach, Philip Cass, Craig Manzi, Edinburgh Parallel Computing Centre |
11:00 AM | 11:15 AM | Slide Presentation: The Road to Devops HPC Cluster Management, Ken P. Schmidt, Evan J. Felix, Pacific Northwest National Laboratory |
11:15 AM | 11:30 AM | Paper: What Deploying MFA Taught Us About Changing Infrastructure, Abe Singer, Shane Canon, Rebecca Hartman-Baker, Kelly L. Rowland, David Skinner, Craig Lant, Lawrence Berkeley National Laboratory and National Energy Research Scientific Computing Center |
11:30 AM | 11:45 AM | Paper: A Better Way of Scheduling Jobs on HPC Systems: Simultaneous Fair-Share, Craig P. Steffen, University of Illinois and National Center for Supercomputing Applications |
11:45 AM | 12:00 AM | Closing Remarks and Open Discussion |
Proceedings
https://github.com/HPCSYSPROS/Workshop19Topics of Interest
Here are some topics of interest for this group. Note that these are here to indicate direction, not to disallow other related topics.
- Cluster, configuration, or software management
- Performance tuning/Benchmarking
- Resource manager and job scheduler configuration
- Monitoring/Mean-time-to-failure/ROI/Resource utilization
- Virtualization/Clouds
- Designing and troubleshooting HPC interconnects
- Designing and maintaining HPC storage solutions
- Cybersecurity and data protection
- Cluster storage
Example paper ideas might be:
- Best practices for job scheduler configuration
- Advantages of cluster automation
- Managing software on HPC clusters
Calendar
Event | Date |
---|---|
Workshop Submissions Open | April 17, 2019 |
Workshop Submission Close | August 30, 2019 |
Reviews Sent | September 14, 2019 |
Resubmission Open | September 16, 2019 |
Resubmission Closed | September 28, 2019 |
Acceptance Notification | October 11, 2019 |
Organizing Committee
Position | Name | Affiliation |
---|---|---|
Chair | David Clifton | Ansys |
Program Committee Chair | John Blaas | University of Colorado Boulder |
Organizing Committee | ||
Stephen Fralich | Boeing | |
Stephen Lien Harrell | Purdue University | |
Adam Hough | University of Washington | |
William Scullin | Laboratory for Laser Energetics | |
Jenett Tillotson | NCAR |
Program Committee
Name | Affiliation |
---|---|
Jonathon Anderson | University of Colorado Boulder |
Jared Baker | NCAR |
Matt Bidwell | NREL |
Stephen Fralich | Boeing |
Brian Haymore | University of Utah |
Randy Herban | Sylabs Inc. |
Andrew Howard | Microsoft |
Ti Leggett | Argonne National Laboratory |
Hon Wai Leong | DDN |
Scott McMillian | Nvidia |
Paul Peltz Jr. | Oak Ridge National Laboratory |
Jeff Raymond | University of Pittsburgh |
Jenett Tillotson | NCAR |
Alex Younts | Purdue University |
Publication Information
All accepted papers and artifacts will be published on GitHub and archived with a DOI in Zenodo. You can view last years accepted papers here HPCSYSPROS SC18 Workshop Proceedings
Contact Information
If you need to contact us, send email to SIGHPC SYSPROS.
Links
- SC HPC Sysadmin Mailing List - you should join!
- Email us to get an invite to the SIGHPC SYSPROS Slack team
- Upcoming activities including our symposium at PEARC19