Written by Steven Woo
Dominated by the United States, Japan and China, the high-performance computing (HPC) space is driven by an insatiable demand for ever-higher performance and greater power efficiency. With each new supercomputer debut, the above-mentioned trio sets progressively higher bars with the goal of capturing the highest Top500 score.
Summit, Sierra and Sunway TaihuLight
In June 2018, the U.S. recaptured the Top500 performance crown with the launch of Oak Ridge National Laboratory’s Summit supercomputer which achieved 122.3 PFLOPs on the High Performance Linpack (HPL) benchmark. In November 2018, Summit widened its lead as the number one system, bolstering its HPL performance from 122.3 to 143.5 petaflops. Summit, an IBM AC922 system, packs more than three petabytes of DRAM memory and links over 27,000 Nvidia Volta GPUs with more than 9,000 IBM Power9 CPUs.
In addition to Summit clinching the number one spot, November 2018 saw the U.S. designed Sierra supercomputer moving from 71.6 to 94.6 petaflops, bumping it from the number three position to number two. Sierra is equipped with 1.38 petabytes of DRAM memory, 8,640 CPUs and 17,280 Nvidia Tesla V100 Tensor Core GPUs. At third place (as of November 2018) is China’s Sunway TaihuLight supercomputer, installed at the National Supercomputing Center in Wuxi. TaihuLight had previously held the top position on the Top500 list for two years, with an impressive HPL performance of 93.0 petaflops. Developed by China’s National Research Center of Parallel Computer Engineering & Technology (NRCPC), TaihuLight packs Sunway SW26010 260C 1.45GHz processors, 10,649,600 cores and 1,310,720 GB of memory.
The Exascale Supercomputer Program
Summit and Sierra are moving towards the U.S. Department of Energy’s (DoE) goal of developing an Exascale supercomputer capable of achieving 1 ExaFLOP performance within a power budget of 20-40 MegaWatts by 2020, an approximate 1000X performance increase over the previous generation of supercomputers. The US DoE’s Exascale program, launched several years ago to achieve the next level of supercomputing performance has been exploring new architectures and technologies to achieve these goals. With Moore’s Law slowing and Dennard scaling finished, at these levels of performance system power efficiency is a primary concern in the Exascale program. More specifically, a 5X-10X improvement in power efficiency is required compared to the previous generation of supercomputers, meaning that simply scaling up the previous generation’s architectures isn’t a viable option.
GPUs and Memory for HPC
Over the last 10 years, U.S., Chinese, and Japanese supercomputers have adopted compute accelerators based on graphics processor units (GPUs) to bolster performance, power efficiency and compute density (performance in a volume of space). Since Moore’s Law has provided a growing number of transistors over the years, an increasing number of compute pipelines offered by accelerators must be matched with memory bandwidth to feed them. Both HBM and GDDR have been employed by accelerators in the current generation of supercomputers, with Summit’s Tesla V100s using HBM2 and Stampede-2’s Xeon Phi accelerators packing GDDR5.
HPC Memory Bottlenecks
The design of more efficient memory hierarchies for the HPC sector is critical and only growing in importance, as many HPC applications are designed to manipulate large data sets. However, after decades of compute performance improving faster than memory and I/O subsystems, the supercomputing market now finds itself severely limited by bottlenecked memory, I/O performance and stagnating power-efficiency. Minimizing data movement and moving compute closer to where the data resides are two methods for dealing with this massive imbalance.