Intel today announced the Aurora exascale-class supercomputer at Argonne National Laboratory is now fully equipped with 10,624 compute blades. Putting a stake in the ground, Intel said in its announcement that “later this year, Aurora is expected to be the world’s first supercomputer to achieve a theoretical peak performance of more than 2 exaflops … when it enters the TOP500 list.”
Aurora is a collaboration of Intel, Hewlett Packard Enterprise (HPE) and the Department of Energy (DOE). The system incorporates more than 1,024 storage nodes (using DAOS, Intel’s distributed asynchronous object storage) providing 220 petabytes (PB) of capacity at 31PBs of total bandwidth and leverages the HPE Slingshot high-performance fabric. In total, the blades have 63,744 Intel Data Center GPU Max Series and 21,248 Xeon CPU Max Series processors.
“Aurora is the first deployment of Intel’s Max Series GPU, the biggest Xeon Max CPU-based system, and the largest GPU cluster in the world,” said Jeff McVeigh, Intel corporate vice president and general manager of the Super Compute Group. “We’re proud to be part of this historic system and excited for the ground-breaking AI, science and engineering Aurora will enable.”
The company said the installation has been “a delicate operation, with each 70-pound blade requiring specialized machinery to be vertically integrated into Aurora’s refrigerator-sized racks.” Aurora’s 166 racks accommodate 64 blades each and span eight rows, occupying a space equivalent to two basketball courts in the ALCF data center.
Blade installation began more than seven months ago, when the below photo was taken during a visit to ALCF by insideHPC. “We have been living and breathing the Aurora installation since the first pieces were delivered in November of 2021,” said Susan Coghlan, ALCF project director for Aurora. “While we still have a lot of work to do before we can roll the system out to scientists worldwide, it is incredibly exciting to have the final hardware in place.”
Researchers from the ALCF’s Aurora Early Science Program (ESP) and DOE’s Exascale Computing Project will migrate their work from Sunspot to Aurora, allowing them to scale their applications on the full system. Early users will stress test the supercomputer and identify potential bugs that need to be resolved before deployment. This includes efforts to develop generative AI models for science, recently announced at the ISC2023 conference, according to the company.
“While we work toward acceptance testing, we’re going to be using Aurora to train some large-scale open source generative AI models for science,” said Rick Stevens, Argonne National Laboratory associate director. “Aurora, with over 60,000 Intel Max GPUs, a very fast I/O system, and an all solid-state mass storage system, is the perfect environment to train these models.”