At Nvidia’s GTC conference this week, Nvidia and Dell EMC released details of HPC installations at two UK academic supercomputing centers to accelerate computing resources at the two sites.
At Durham University, the COSMA-8 supercomputer — to be used by cosmologists researching the origins of the universe — will be accelerated by Nvidia HDR InfiniBand networking. Based on Dell EMC PowerEdge C6525 servers with AMD EPYC processors, the COSMA 8 system will utilize HDR 200Gb/s InfiniBand networking with In-Network Computing engines for memory-intensive scientific applications. COSMA 8 includes a non-blocking fat tree topology to optimize node performance.
Hosted by the Institute for Computational Cosmology of Durham University on behalf of the DiRAC HPC facility, COSMA 8 will become part of the DiRAC Memory Intensive service, supporting astronomical research of particle physics using large-scale simulations. Scientific exploration will include dark matter, dark energy, black holes and how galaxies and other structures in the universe have formed. DiRAC is a distributed computing facility comprising four UK deployments providing compute resources that match machine architectures to the different algorithm designs and requirements of research problems.
“COSMA 8 is aiming to model the entire universe, over time, from the big bang to today. It will allow humankind to continue advancing our understanding of where we came from and our place in the cosmos, using larger-scale simulations than ever before,” said Alastair Basden, technical manager for the DiRAC Memory Intensive Service at Durham University. “The massive scale of these simulations relies on the bandwidth only InfiniBand can deliver to make this research possible. It’s one example of how DiRAC and Durham University continue to advance the field of supercomputing through their ongoing collaboration with Nvidia.”
Nvidia said utilization of InfiniBand in DiRAC’s COSMA 8 complements other collaborations the company has undertaken with Durham University, including research into using Nvidia BlueField data processing units.
And at Cambridge University, Nvidia is accelerating what it said is the first Top500 academic cloud-native supercomputer using Nvidia BlueField-2 DPUs and HDR Inifiniband. The system, hosted by the university at the Cambridge Service for Data Driven Discovery (CSD3) is a UK National Research Cloud.
The site is being enhanced by a 4-petaflops Dell-EMC system with Nvidia A100 GPUs, BlueField DPUs and InfiniBand for multi-tenant, bare-metal high performance computing AI and data analytics. The CSD3’s cloud-native supercomputing platform is enabled by a cloud HPC software stack, called Scientific OpenStack, developed by the University of Cambridge and StackHPC.
The CSD3 system is projected to deliver a 4 PFLOPS of performance at deployment. The system uses NVIDIA GPUs and x86 CPUs to provide more than 10 PFLOPS of performance and it includes a solid-state storage array based on the Dell/Cambridge data accelerator.
In a blog post issued today, Nvidea discussed the characteristics of cloud-native supcomputers.
“First, they let multiple users share a supercomputer while ensuring that each user’s workload stays secure and private. It’s a capability known as ‘multi-tenant isolation’ that’s available in today’s commercial cloud computing services. But it’s typically not found in HPC systems used for technical and scientific workloads where raw performance is the top priority and security services once slowed operations.
“Second, cloud-native supercomputers use DPUs to handle tasks such as storage, security for tenant isolation and systems management. This offloads the CPU to focus on processing tasks, maximizing overall system performance. The result is a supercomputer that enables native cloud services without a loss in performance. Looking forward, DPUs can handle additional offload tasks, so systems maintain peak efficiency running HPC and AI workloads.”
The CSD3 provides resources for researchers investigating astrophysics, nuclear fusion power generation development and lifesaving clinical medicine applications, among other pursuits, using using converged simulation, AI and data analytics workflows.
The CSD3 systemuses BlueField-2 DPUs to offload infrastructure management, such as security policies and storage frameworks from the host, while providing acceleration and isolation for workloads to maximize input/output performance.
“Providing an easy and secure way to access the immense computing power of CSD3 is crucial to ushering in a new generation of scientific exploration that serves both the scientific community and industry in the UK,“ said Paul Calleja, director of Research Computing Services at Cambridge University. “The extreme performance of Nvidia InfiniBand, together with the offloading, isolation and acceleration of workloads provided by BlueField DPUs, combined with our ‘Scientific OpenStack’ has enabled Cambridge University to provide a world-class cloud-native supercomputer for driving research that will benefit all of humankind.”
Nvidia said networking performance is accelerated by HDR InfiniBand’s In-Network Computing engines, providing bare-metal performance, while natively supporting multi-node tenant isolation.
CSD3 is expected to be operational later this year.