Sponsored Post
Exascale now!
High-performance computing (HPC) simulations are providing unparalleled insights into new scientific discoveries and are essential tools for industrial product design. HPC technology development over the last decades has been fueled by the scientific and engineering communities’ unquenchable thirst for more and more computer power. Exascale has been in everyone’s mind ever since the first Petaflop system was deployed in 2008. The target was clear: 1 exaflops within a 20-megawatt (MW) power envelop by 2020. As of today, the first Exascale systems are starting to be installed around the globe while post-Exascale supercomputers are already planned. The focus is clearly on meaningful application performance (Exascale = exaflops delivered to HPC applications) and this still entails multiple challenges [1] well beyond raw hardware performance (exaflops).
The HPC applications have evolved to deliver more performance through unprecedented levels of parallelism, but also with new techniques. Noticeably, (Big) Data Analysis was introduced to refine computer models through the mining of real-life physical observations. More recently, Artificial Intelligence (AI) frameworks made possible the use of surrogate models, thus drastically accelerating a wide range of HPC applications, and considerably improving the quality of the simulations.
The diversity challenge
Concurrently to the evolution of HPC applications software, HPC hardware architecture has also changed significantly. Ten years ago, the HPC ecosystem looked quite uniform, with most supercomputers based on x86 CPUs. By contrast, today’s supercomputer architectures are quite diverse. HPC systems are now commonly composed of several partitions, each featuring different types of computing/processing nodes. On the CPU side, different instruction set architectures (ISAs) are used besides traditional x86, particularly ARM and possibly in the future RISC-V. GPUs have been so far the HPC Accelerators of choice, with soon multiple providers. In addition to GPUs, other accelerators are proposed such as FPGAs, or specific AI processing units (IPUS, TPUs …). This unprecedented wave of innovations in processors technology [2] presents for developers the opportunity to boost HPC applications performance, whilst at the same time tackling the challenge of such heterogeneous environments.
Energy efficiency at Exascale
Even though each new generation of computing elements is delivering more performance per Watt thanks to new architecture and advances in electronic manufacturing, the overall consumption of Exascale systems is nevertheless reaching costly levels. Exascale HPC datacenters are now commonly configured to provide 20 MW of electrical power, or more. With the average electricity cost, each MW translates into roughly $1 million per year. Hence, over a period of 5 years, the electricity bill for a 20 MW system will sum up to $100 million. Taking these considerations in mind, the supercomputer and datacenter utilities, and most importantly the cooling system, must be carefully optimized. Additionally, GPUs and CPUs consumption has been growing steadily, soon exceeding 500 W or even reaching 1000 W. With such power concentration, the heat dissipation requirements far exceed the capacity of classical air-cooled servers, but liquid cooling has proven to be the perfect practical solution for such requirements at Exascale.
BullSequana XH3000, the HPC open platform for Exascale and beyond
Atos recently announced the BullSequana XH3000 [3], its fourth generation Direct Liquid Cooling (DLC) HPC system. Its unique cooling technology is based on a body of 72 patents, and it relies on a decade long experience of large DLC system deployments, such as Jülich’s JEWELS Booster (#1 in Europe) or the upcoming EuroHPC’s Leonardo system at CINECA [4]. Improving on previous generations, the newly introduced BullSequana XH3000 platform greatly expands the power supply and cooling capacities for each rack. As a result, a higher inlet temperature is admissible, and the datacenter free-cooling range is further extended. The use of chilled water, a strong requirement for classical air-cooled servers, is not necessary with DLC, thus allowing for a Power Usage Effectiveness (PUE) as low as 1.05 for most datacenters, all year around. On average, DLC reduces by 40% the HPC datacenters global electricity bill.
In addition to energy efficiency, BullSequana XH3000 is meant to be the most open HPC platform. The system is supporting the multiple technologies that will power Exascale systems. The first blades will feature the upcoming new x86 CPUs from Intel (Sapphire Rapids) and AMD (Genoa). As soon as they become available, the next generation GPUs from Nvidia, AMD and Intel will be readily supported. The first HPC prototype equipped with SiPearl’s ARM CPUs is also planned on the BullSequana XH3000 roadmap.
HPC is synonymous with the highest degree of parallelism. To meet this requirement, different high speed interconnection networks will be available on the BullSequana XH3000 platform from day 1: InfiniBand, High Speed Ethernet, and the Atos BXI.
The OpenSequana program was designed to further enlarge the wealth of technology available on BullSequana XH3000. On the one hand, it enables Atos to easily integrate open technology, such as OpenCompute network adaptor cards. On the other hand, the blade and system interface specifications are available to the Atos partners willing to take advantage of the platform infrastructure (management, power, cooling, interconnection networks…) and expand their technology reach.
BullSequana XH3000 is also targeting the future coupling of HPC and Quantum computing. Within the framework of the EuroHPC HPCQS project [5], a first prototype will allow researchers to explore these possibilities. The Atos QLM (Quantum Learning Machine) software environment will ensure a smooth integration of the Quantum computing with the HPC platform.
The BullSequana XH3000 Exascale Software environment
The Atos Smart Management Center (SMC) software suite completes the BullSequana XH3000 solution. It is based on Red Hat Enterprise Linux 8.x distribution and its architecture featuring orchestrated Micro-Services, provides all necessary utilities for Provisioning, Monitoring, Logging… The SMC xScale version of the SMC suite is engineered to easily scale to the tens of thousands of nodes that constitute an Exascale system.
Security is an important key point for HPC systems which are common cybersecurity attacks targets. Atos used its experience as #1 European Cybersecurity provider, to address security early in the design phase. The SMC suite on BullSequana XH3000 provides a complete hardware/software solution with the highest level of security enabled while minimizing the impact on HPC applications performance.
AI, which is now used to accelerate HPC applications, also plays an important role in SMC runtime optimization modules for energy optimization, resource scheduling, data management, performance optimization and system preventive maintenance. These management software tools complement the hardware DLC hardware technology to improve the global energy efficiency of the system.
Finally, it must be recognized that Cloud solutions [6] are now increasingly becoming part of the HPC ecosystem. To facilitate Cloud solution deployments SMC will provide a seamless integration of the Nimbix’s JARVICE™ containerized technology [7].
In summary, the BullSequana XH3000 which supports a large spectrum of hardware technologies and feature a rich software environment will be the platform of choice to meet the Exascale challenges. The BullSequana XH3000 open architecture will lead Atos, its technology partners, and their users well into the post-Exascale era.
About the Author
Jean-Pierre Panziera – High Performance Computing CTO, Atos
[1] https://insidehpc.com/2021/12/the-atos-perspective-for-exascale-a-race-beyond-1018/
[2] “An unprecedented wave of innovations in processors” – Philippe Duluc
[4] https://www.cineca.it/en/hot-topics/Leonardo-announce
[5] https://www.genci.fr/en/node/1156
[6] https://insidehpc.com/2021/09/delivering-on-the-promise-of-hybrid-multi-cloud-for-hpc/
[7] https://atos.net/en/solutions/high-performance-computing-hpc/hpc-as-a-service