SIGHPC Systems Professionals Workshop

HPCSYSPROS24

Friday, November 22 2024

8:30am - 12pm EST

Room B312-B313A

Held in conjunction with

SC24 Logo

and in cooperation with

SIGHPC Logo

 Quick Information

Supercomputing systems present complex challenges to personnel who design, deploy and maintain these systems. Standing up these systems and keeping them running require novel solutions that are unique to high performance computing. The success of any supercomputing center depends on stable and reliable systems, and HPC Systems Professionals are crucial to that success.

The Eighth Annual HPC Systems Professionals Workshop will bring together systems administrators, systems architects, and systems analysts in order to share best practices, discuss cutting-edge technologies, and advance the state-of-the-practice for HPC systems. This CFP requests that participants submit either papers, slide presentations, or 5-minute Lightning Talk proposals. Additionally reproducible artifacts (code segments, test suites, configuration management templates) which can be made available to the community for use are welcome for submissions either as a standalone submission or in addition to any paper or talk submissions.

 Social Event

SIGHPC SYSPROS and CaRCC Systems Facing group will be hosting a social event from 6-8pm Monday November 18th at Stats Brewpub at 300 Marietta St NW Suite 101, Atlanta, GA. Join us for some beers, banter, and sharing of war stories with other HPC Systems professionals.

 Schedule

All times in Eastern Time

StartEndDescription
8:30 AM8:45 AMOpening Remarks, Jared Baker(NCAR), John Blaas (Lambda)
8:45 AM9:00 AM Advancing ODA Standardization Through An Open Source Dashboard, Tim Osborne (ORNL), Rachel Palumbo (ORNL), Leah Huk (ORNL), Ryan Adamson (ORNL), Rob Jones (ORNL), Corwin Lester (ORNL)
9:00 AM9:15 AM Next-Gen HPC Status Viz: Powering Up Node Status with Next.js and the Slurm API, Johnathan Lee (Arizona State University), Jason Yalim (Arizona State University)
9:15 AM10:00 AM Beyond the Hype: Uncovering the Real I/O Needs of LLMs, Kartik Subramanian (VAST Data Inc), Glenn K. Lockwood (Microsoft Corporation)
10:00 AM10:30 AM Morning Coffee Break
10:30 AM10:40 AM Benchmarking and Continuous Performance Monitoring of HPC Resources using the XDMoD Application Kernel Module, Nikolay A. Simakov (State University of New York at Buffalo)
10:40 AM10:50 AM Kubernetes Resource Scaling via Batch Node Conversion on the Anvil Supercomputer, Erik Gough (Purdue University), Dashiell Lumas (Purdue University)
10:50 AM11:00 AM Increasing effective storage capacity with hierarchical storage management (HSM) for NCAR’s Campaign Storage, Aric Werner (NCAR), Joseph Mendoza (NCAR), Ben Kirk (NCAR)
11:00 AM11:15 AM SStack: Software Stacks for easier and cleaner software builds on HPC, Strahinja Trecakov (New Mexico State University), Nicholas Von Wolff (New Mexico State University),Mohammad Al-Tahat (New Mexico State University)
11:15 AM11:30 AM Cluster Resource Management for Sustainable and Efficient Computing, Andrei Bersatti (Georgia Institute of Technology), Aaron Jezghani (Georgia Institute of Technology)
11:30 AM11:45 PM Dynamic Login Node Resource Control and Monitoring with Arbiter 3, Jackson McKay(University of Utah), Kai Forrest(University of Utah), Paul Fischer (University of Uta)
11:45 PM12:00 PM Chapter Updates and Closing Remarks, Jared Baker (NCAR), John Blaas(Lambda)

 Topics of Interest

Here are some topics of interest for this group. Note that these are here to indicate direction, not to disallow other related topics.

  • Cluster, configuration, or software management
  • Cybersecurity and data protection
  • Performance tuning/Benchmarking
  • Resource manager and job scheduler configuration
  • Monitoring/Mean-time-to-failure/ROI/Resource utilization
  • HPC storage solutions
  • High speed/ Low Latency networking
  • Composable infrastructure and containers
  • Elastic workloads or optimizations for workload types
  • Web-based cluster front ends
  • Challenges with AI workloads (GPU management, Interconnect, Data Movement)

Example paper ideas might be:

  • Best practices for job scheduler configuration
  • Advantages of cluster automation
  • Managing software on HPC clusters

 Calendar

EventDate
Workshop Submissions OpenMay 28, 2024
Workshop Submission CloseAugust 9, 2024
Reviews Sent and Resubmissions OpenAugust 23, 2024
Resubmission ClosedAugust 30, 2024
Final Program NotificationsSeptember 13, 2024

 Organizing Committee

PositionNameAffiliation
Workshop ChairJohn Blaas Lambda
Program ChairJared Baker NCAR
Organizing Committee
Blaise HartmanNASA
Betsy HilleryPurdue University
David Clifton Ansys
Hon Wai LeongDDN
John LegatoNIH
Michael HartmanStanford University
Gary SkousonPenn State University
Kurt MaierPNNL
Stephen FralichBoeing

 Program Committee

NameAffiliation
Emma ShaubNCAR
Dori SajdakSUNY at Buffalo
Ben NickellINL
Wyatt MadejNCSA
Sam ListonUniversity of Utah
Josh LeVoirLambda
Honwai LeongDDN
Jeremy FischerIndian University
Eric CoulterGeorgia Institute of Technology

Publication Information

All accepted papers and artifacts will be published on GitHub and archived with a DOI in Zenodo. You can view the previous years presentations here HPCSYSPROS SC23 Workshop Proceedings

 Contact Information

If you need to contact us, send email to SIGHPC SYSPROS.

 Links