By Steven Woo, Rambus Fellow
Supercomputing 2023 brought together some of the brightest minds in the field of high-performance computing, showcasing the latest in exascale computing and the challenges faced in the pursuit of next-generation advances in computing. Talks by Scott Atchley from Oak Ridge National Laboratory and Stephen Pawlowski from Intel stood out for their valuable perspectives on the current state of supercomputing and future directions for the industry.
Frontier: Exploring Exascale
Scott Atchley, Distinguished R&D Staff Member and Chief Technology Officer, Oak Ridge National Laboratory’s National Center for Computational Science
Scott Atchley’s talk delved into the US supercomputer “Frontier” and its journey to meet the challenges of exascale computing. The goal was ambitious: achieving performance levels 1000 times higher than petascale systems deployed in 2008, all within a budget of 4x-6x compared to the previous generation.
Challenges identified by DARPA in 2008 when planning for Frontier included energy and power, memory and storage, concurrency and locality, and resiliency. Frontier successfully addressed these challenges, showcasing advancements in power efficiency, memory capacity and bandwidth, concurrency management, and resiliency. However, the need for a budget 4x-6x higher than the previous generation arose due to technology costs not declining by 1000x, which limited the growth of many resources compared to the previous generation of supercomputers. Components like storage and memory, particularly with the use of High Bandwidth Memory (HBM), proved more expensive.
The findings underscore the complexities of achieving exascale computing and the necessity of adapting to evolving technological landscapes, especially in the face of cost dynamics in storage and memory technologies.
A Perspective on 1000x Energy Efficiency
Stephen Pawlowski, Senior Fellow, Intel
In his keynote, Stephen Pawlowski discussed the challenges of achieving 1000x energy efficiency within the next two decades. With exascale supercomputers now a reality, early thoughts are being discussed about how to improve energy consumption and power efficiency, critical factors in achieving next-generation performance.
Pawlowski highlighted the significant energy and time consumed by data movement, especially between processors and memory. To address this, he proposed stacking high-performance memory on top of a System-on-Chip (SoC). This approach promises a 5-6x reduction in energy and a 10x boost in bandwidth. The potential benefits make it a compelling path forward for the industry.
However, challenges emerge, such as the need to standardize memory footprints, determine interconnect locations, manage thermals, and address issues like Error-Correcting Codes (ECC) and post-package repair.
SC23 Show Floor Highlights
The show floor featured many exciting developments, with the CXL consortium showcasing numerous demonstrations of CXL technology, including the Rambus CXL Platform Development Kit (PDK) announced at the show. The Rambus PDK marks an exciting step in the CXL journey that enables module and system makers to prototype and test CXL-based memory expansion and pooling solutions for AI.
AI remained a focal point at SC23, with several demos featuring cutting-edge AI platforms at both the chip and system levels. As models become larger and more sophisticated, continued advances in architecture and memory systems will be needed to keep up with these growing demands.
With computing performance continuing its upward trajectory, and with power efficiency improvements becoming more difficult with each new generation, there continue to be a noticeable increase in discussions and demonstrations of liquid cooling becoming pervasive in future data centers. There were also some intriguing immersion cooling demos, offering the promise of even greater cooling capabilities than traditional liquid cooling technology if needed by future systems.
Supercomputing 2023, through these talks and on the show floor, provided a glimpse into the relentless pursuit of higher performance, energy efficiency, and innovative solutions shaping the future of high-performance computing.