[SPONSORED CONTENT] HPC systems customers (and vendors) are in permanent pursuit of more compute power with equal or greater node density. But with that comes more power consumption, greater heat generation and rising cooling costs. Because of this, the IT business – with a boost from the HPC and hyperscale segments – is spiraling up the list of industries ranked by power consumption. According to ITProPortal, data center power use is expected to jump 50 percent by 2030.
The combination of higher electrical consumption and costs, and higher carbon emissions is viewed with increasing alarm, and has become a limiting factor for HPC. Consider this: with the annual electric bill for an exascale system expected to approach $20 million, it’s been argued that the next great supercomputing throughput milestone, zettascale (1,000 exaFLOPS), is a practical impossibility using current technologies and power sources.
In the face of this bleak, high-consumption and high-carbon future, the HPC server market has increasingly turned to energy efficient liquid cooling to hold down energy costs. The transition away from air cooling initially was regarded as a risky proposition. But that outlook has changed significantly as water cooling technologies have matured as they have been implemented on a multi-generational basis at supercomputing centers housing some of the world’s most powerful and expensive HPC systems.
An early liquid cooling innovator in HPC, systems maker Lenovo dates its first major water-cooled installation to 2012 at one of Europe’s biggest supercomputing centers (more on this below). The company’s line of Neptune™ liquid cooling technologies provide a three-pronged cooling approach that can be used together or independently: direct warm-water cooling (DWC), liquid-assisted air cooling, and rear-door heat exchanger (RDHX) , along with other technologies like software designed to run systems more efficiently.
Lenovo leads the HPC server industry in the use of warm water cooling – the warmer the water, the less energy is expended chilling it either before or after it flows through servers. You might not think 122-degree (Fahrenheit) water could cool a server, but Lenovo’s doing it. The company also is developing water-recycling capabilities that could move HPC centers toward carbon-neutral status, possibly even carbon-negative in the future.
Another point of distinction is that Neptune™ DWC technologies utilize leak-resistant copper tubing to circulate water through more system components than anyone else. This comprehensive approach to liquid cooling removes more than 90 percent of the heat generated by the server.
Let’s look at Lenovo’s highest performance, most densely packaged server, the fan-free ThinkSystem SD650-N V2 GPU server with Neptune™ direct warm water-cooling technology, an HPC-AI/hyperscale system. It utilizes water up to 50⁰C/122⁰F to remove heat from two 3rd Gen Intel Xeon Scalable CPUs, four NVIDIA HGX A100 GPUs and NVIDIA HDR InfiniBand networking, along with memory, network interface controllers, local storage and voltage regulators.
Another benefit: compared with air cooling – with high-rev fans and air conditioners roaring away – water cooling is much quieter. So along with reduced greenhouse gas pollution, the ThinkSystem SD650-N V2 has less nerve-wracking noise pollution.
The server delivers up to 30 to 40 percent data center cooling cost reduction, Lenovo reports, and supports PUE ratings below 1.1 depending on the data center design. It also enables data center growth without adding more Computer Room Air Conditioning (CRAC) units. And because liquid cooling keeps servers operating at lower temperatures, Neptune™ extends the lifespans of parts and servers, according to Lenovo.
A single, standard 42U rack holds 36 of these servers and delivers up to 2PFLOPs of compute performance, enough to earn a spot on the current TOP500 list of the world’s most powerful supercomputers.
Looking ahead, the industry faces steepening cooling challenges as the power drawn by CPUs, GPUs and even memory DIMMs and NICs steadily climbs. In 2006, 20kW were required to power a 56-node, 224-core rack for a Lenovo HPC system installed at Eli Lilly; by 2018, the 72-node/3,456-core racks within the Lenovo SuperMUC-NG supercomputer at the Leibniz Supercomputing Centre (LRZ) in Munich consumed 46kW per rack. The ThinkSystem SD650-N V2 comes in at 80 kW per rack, and Lenovo anticipates that by 2024 its high-end systems will consume 180 kW.
LRZ has been a pioneer in energy-efficient supercomputing for over a decade. From 2012 to the end of this year, four generations of IBM/Lenovo water-cooled supercomputers have been stood up at LRZ. It’s an envelope-pushing site utilizing Lenovo’s most advanced liquid technologies, which the company has then extended to the broader HPC industry.
A key to Lenovo’s cooling leadership is its long experience advancing liquid-related technologies, according to Lenovo’s Martin Hiegl, Director, HPC Customer Solutions. Take, for example, the heatsinks used in Lenovo HPC servers.
“The shiny copper water loop and big manifolds are the most visible part of Lenovo Neptune,” Hiegl said. “The secret sauce lies, however, within the layout across a system for stable cooling capability with low pressure across the different heat sources and even the tiny details like the microfins within the heat sink itself. The more than a decade of experience our Lenovo engineers bring to the table makes them industry leading in their designs.”
In addition, Hiegl said Lenovo engineers focus on achieving consistent operational temperatures across and among processors.
“For example, between the different CPUs you want to maintain temperature balance,” he said. “That’s why our water loops on the node are very carefully designed to bring optimal cooling to the different heat sources so that you don’t have one CPU running at 80 degrees Celsius and another CPU running at 90 degrees Celsius, which can create thermal jitter with different performance between the two CPUs on the same node. We design our systems specifically to avoid that. Our decade of experience doing this is something no one else brings to the table.”
Looking back at 2012 – when HPC-class servers only had CPUs and generated much less heat – liquid cooling was a new approach that at that time made some people nervous. Scott Tease, Lenovo’s Vice President and General Manager, HPC & AI, was part of the team that installed the company’s first supercomputer at LRZ.
“We were freaking out a little bit, it was 9700 nodes, liquid cooled for the first time ever, and it got us nervous,” he told StorageReview in a podcast interview. “But it’s been an incredible story, and ever since, the customer has been happy. Some of those nodes are just now coming out of production…, that’s how long it’s been in production. But what we’re seeing with Neptune and with liquid cooling in general is that the reasons to go towards liquids are even (stronger) than a decade ago.”
He said LRZ had compelling cost motives for making the jump to liquid since power costs are two times more in Germany compared to the U.S. “So every time they could drive power consumption out it had a pretty big benefit for them on their energy bill,” Tease said, with savings amounting to hundreds of thousands of Euros per month.
Bottom line: LRZ estimates liquid cooling and all the optimizations with Lenovo around it has reduced their energy costs by 30 percent.
Longer term, Lenovo wants to work with customers like LRZ that already recycle water heated by HPC systems for such purposes as heating buildings and generating colder water through adsorption technology for an even wider cooling impact, in combination with other renewable energy sources, to eliminate carbon emission altogether.
Tease said such aspirations support the growing sustainability ethic taking hold in the HPC community, with liquid cooling playing a key role. “That’s been surprising to me, how broad it is globally,” he said. “People see liquid cooling and its advantages from an energy efficiency, carbon reduction standpoint. It’s resonating universally, globally.”