Frontera, deployed at the Texas Advanced Computing Center supercomputer and the ninth fastest HPC system in the world, will receive an expansion to support urgent computing and basic science, according to TACC.
The expansion is funded by an award from the National Science Foundation (NSF) and a contribution from Dell Giving, the philanthropic arm of Dell Technologies. The expansion adds 396 Dell R640 server nodes housed in 11 racks, as well as Nvidia/Mellanox InfiniBand HDR cards and switches to connect into Frontera’s high speed fabric. Each node contains 56 Intel Xeon Platinum 8280 (“Cascade Lake”) processors and 192GB of DDR-4 memory — the same configuration as Frontera’s existing 8,008 compute nodes.
The new hardware will be available to researchers in January. The expansion will add nearly 3.5 million node hours of compute time annually to the 70 million available on Frontera. Use of the system is determined by a peer-review committee based on a project’s need for very large scale computing, and the ability to efficiently use a supercomputer on the scale of Frontera.
TACC said the NSF award will allow the academic community to double its usage of Longhorn, a subsystem of Frontera built in partnership with IBM and Nvidia to support deep learning workloads. Longhorn is currently used by university researchers, government agencies and TACC industrial partners.
TACC’s urgent computing capabilities have been utilized this year for the COVID-19 crisis and for the record hurricane season. In coming years, it will also support responses to such emergencies as earthquakes, tornadoes and other large-scale disasters, according to TACC.
“This supplemental award from NSF and generous gift from Dell allows us to continue to support researchers responding to national and global emergencies, without sacrificing the fundamental science that the Frontera system was built for,” said Dan Stanzione, TACC executive director.
“NSF is pleased that the foundation’s cyber-infrastructure investments, including those at TACC, have been quickly and successfully mobilized to address urgent national needs without sacrificing our commitment to supporting long-term basic research on behalf of the Nation’s future,” said the Director of the NSF Office for Advanced Cyberinfrastructure Manish Parashar.
Dedicated time on advanced computing systems for emergency response and operations is rare outside of supercomputers operated by mission-driven agencies like NASA, the National Oceanographic and Atmospheric Administration (NOAA), or the U.S. Geological Survey.
In the early phases of the pandemic, TACC devoted more than 30 percent of its computing resources to supporting COVID-19 research, enabling more than 50 teams to explore the virus in ways that otherwise would not have been possible. This work is expected to continue for the years to come.
TACC said that during this past hurricane season, TACC interrupted work on its systems 10 times to produce emergency storm surge simulations for hurricanes making landfall in the Gulf of Mexico, information that was shared with emergency managers and first responders in the region.
TACC has supported urgent computing efforts since 2003, when TACC resources were used to guide investigators to debris from the Space Shuttle Challenger in East Texas. During Hurricane Ike and Harvey, the Deep Water Horizon oil spill, and earthquake recovery efforts in Haiti and Japan, TACC answered the call to provide emergency compute resources. The expansion of Frontera will make more of these humanitarian efforts possible without displacing other important scientific work.
In the 14 months since TACC deployed Frontera, the system has enabled the first all-atom simulations of COVID-19’s protein spike; the discovery of a cluster of 250 previously unknown stars in our galaxy that were born elsewhere; first-ever predictions of gravitational waves from large mass ratio merging black holes; and the largest, most realistic tornado simulations ever attempted.
source: Andrew Dubrow, TACC