Three of the newest National Nuclear Security Administration (NNSA) commodity computing clusters recently deployed at Lawrence Livermore National Laboratory (LLNL) and Sandia National Laboratories (SNL) are among the most powerful supercomputing systems in the world, Top500 organizers announced Monday.
Funded under the second Commodity Technology Systems contract (CTS-2) by NNSA’s Advanced Simulation and Computing (ASC) program, the machines sited at LLNL — named Dane and Bengal — began performing key modeling and simulation functions for the NNSA’s stockpile stewardship mission in mid-2023. Each system is built by Dell Technologies and powered by 4th Gen Intel Xeon Scalable Processors coupled to Cornelis Networks Omni-Path high-speed networking.
The debut of Dane, a 7.041 petaFLOP (slightly more than 7 quadrillion calculations per second) cluster at #108 and the 6.134 petaFLOP Bengal at #129, brings the total of LLNL-sited systems on the Top500 List to 11, the most of any supercomputing center in the world. A third new CTS-2 system, sited at SNL and named Stout, reached 8.987 petaFLOPs on the LINPACK benchmark used to determine the rankings, earning it 87th place on the list. Each system attained 89-percent or greater efficiency on LINPACK. Top500 organizers unveiled the updated bi-annual list of the world’s most powerful computers at the 2023 International Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing) in Denver.
Since their deployment for mission-critical work, Dane, Bengal, Stout and a fourth CTS-2 system named Amber at SNL, have displayed improved performance and efficiency over the previous generation of commodity systems (CTS-1), with NNSA researchers reporting initial speedups of 4-5X across a wide range of high-performance computing applications.
“The deployment of these first CTS-2 platforms provides a significant addition to the NNSA labs’ simulation environments in support of our national security mission,” said Matt Leininger, CTS project lead for the NNSA Tri-Labs (LLNL, SNL and Los Alamos National Laboratory). “We are proud to continue the NNSA ASC tradition of excellence in high-performance computing and U.S. technology partnerships at LLNL, SNL and throughout the NNSA complex.”
The commodity-technology-based systems are the “workhorses” of the NNSA and support the day-to-day simulation workload while reducing costs by standardizing hardware and software across the NNSA labs, when coupled with the LLNL-led Tri-Laboratory Operating System Stack (TOSS) and Tri-Lab Common Environment. The CTS-2 platforms allow NNSA’s more powerful Advanced Technology System supercomputers, such as the current Sierra system and future El Capitan exascale system at LLNL, to focus on the most complex problems critical to NNSA’s Stockpile Stewardship Program.
Each CTS-2 system is made of building blocks called “scalable units” (SUs) representing about 1.5 petaFLOPs of computing power apiece. The SUs design allows the NNSA laboratories to fine-tune system performance depending on mission need or programmatic budgets. Each SU is built with Dell PowerEdge C6620 and R760 servers that utilize 4th Gen Intel Xeon Scalable processors.
The 4th Generation Intel Xeon Scalable processors are equipped with purpose-built accelerators that can elevate HPC workload performance and power efficiency by offloading tasks to these acceleration features. Intel Advanced Matrix Extensions (Intel AMX), one of the processors’ built-in accelerator engines, can transform the large matrix math calculations that are at the heart of deep-learning workloads into a single operation, making them critical for delivering performance across workloads where HPC and AI converge.
“These processors are ideal for building and deploying general-purpose AI workloads with the most popular AI frameworks and libraries. These capabilities will make it possible for the engineers, researchers and scientists at Lawrence Livermore National Laboratory and Sandia National Laboratory to simulate complex problems critical to the nation’s national security,” said Deepak Patil, corporate vice president and general manager, Accelerated Computing Systems & Graphics, Intel. “We’re proud to support the labs with solutions that help accelerate the time and effort to analyze future and existing data for their focus areas.”
Each system is also outfitted with 200Gbps Omni-Path Express high-speed networking from Cornelis Networks, providing CTS-2 with a highly scalable fabric with open-source software fully integrated with TOSS.
“With CTS-2, the team at Cornelis is excited to once again support delivery of the workhorse commodity systems of the NNSA, having previously delivered the high-performance networks for the second generation of Tri-Lab Capacity Clusters (TLCC2) and CTS-1,” said Gunnar K. Gunnarsson, vice president of solutions delivery and support at Cornelis. “Working closely with our ecosystem partners, Dell Technologies and Intel, Cornelis is pleased to enable leading cluster performance and efficiency at scale for the Tri-Labs’ mission-critical workloads with our Omni-Path technology. We look forward to continued CTS-2 deployments, including introducing our new 400Gbps CN5000 Omni-Path product family as part of a future architecture refresh.”
Several additional CTS-2 systems will be deployed at LLNL and SNL in the first half of 2024, in support of various NNSA programs leveraging the CTS architecture and procurement developed by the NNSA ASC program.