The company that built the world’s nos. 2 and 3 most powerful supercomputers, according to the latest Top500 ranking of such systems, is to all appearances backing away from the supercomputer systems business. IBM, whose Summit (at Oak Ridge National Laboratory) and Sierra (at Lawrence Livermore National Laboratory) CORAL-1 systems set the global standard for pre-exascale supercomputing, failed to win any of the three exascale contracts, and since then IBM has seemingly withdrawn from the HPC systems field.
This has been widely discussed within the HPC community for at least the last 18 months. In fact, according to one industry analyst, as long ago as the annual ISC Conference in Frankfurt four years ago, he was shocked when IBM told him the company was no longer interested in the HPC business per se, that they were active in HPC as an entrée to the nascent AI market.
Only reinforcing these impression was IBM’s announcement last week of its split into two entities, with its legacy IT infrastructure – the Managed Infrastructure Services unit of its Global Technology Services division – becoming a new public company (“NewCo”), allowing IBM to concentrate on its hybrid cloud and AI growth strategy.
“Strategically, there’s a question of how IBM is looking at the high performance computing market in general,” said Addison Snell, CEO of industry watcher Intersect360 Research. “What we’ve really seen is a shift of priorities to focus on artificial intelligence for certain and hybrid clouds. But there’s been little to no mention of high performance computing or even supercomputing at an intentional level since before even Summit and Sierra were installed. By the time those were up and running, if they use the word supercomputer at all it would be in conjunction with AI and AI supercomputer. It’s like they’ve been backing away from HPC as if it’s a dirty word that they didn’t want to apply to their business. Now, they’re certainly still clearly supplying what we would consider to be high performance computing systems, particularly in some commercial markets (financial services, manufacturing, healthcare). But as a direct targeted market for their POWER systems, they just haven’t been approaching it.”
In last week’s spinoff announcement, nowhere do the terms HPC or supercomputing appear. This is of a piece with IBM’s decades-long shift away from systems, including divestitures of its PC and X86 server business units, in favor of software and cloud, emphasized in the company’s $34 billion purchase in 2019 of Red Hat, an acquisition spearheaded by then SVP of IBM Hybrid Cloud and now CEO Arvind Krishna.
“IBM is laser-focused on the $1 trillion hybrid cloud opportunity,” he said. “Client buying needs for application and infrastructure services are diverging, while adoption of our hybrid cloud platform is accelerating.”
Where does that leave IBM and HPC? And how, for that matter, does the IBM split fit into other recent HPC M&A activity, including Nvidia’s acquisitions of high performance network vendor Mellanox last year and its planned acquisition of CPU design company Arm, along with the more recently announced purchase of FPGA maker Xilinx by AMD – AMD, whose CPUs and GPUs will drive two of the exascale systems, Frontier and El Capitan, systems that will be delivered by HPE and 2018 acquiree Cray?
In search of a larger pattern, if a pattern exists, we put these and other questions to several HPC observers. A theme that emerged is that at the high end of HPC, systems have become so compute-intensive, so specialized, complex and costly that they are increasingly less relevant and transferable to the higher volume mid-sized enterprise HPC market, never mind the mainstream data center server market. Building a system that can deliver, say, a quintillion (1018) calculations per second requires technologies and capabilities not needed for a relatively smaller cluster. Scaled-up supercomputing clusters don’t scale down, the argument goes, making extreme-scale HPC systems a bad ROI proposition.
In stark contrast, the argument continues, stands the HPC components business, a high-margin technology market for CPUs, GPUs, high performance networks, high-capacity HBM memory, high speed interconnects and other gear used in supercomputing-class systems
Karl Freund, Senior Analyst, Machine Learning & HPC at Moor Insights & Strategy, said he’s inclined to agree with this view, adding that IBM is not alone in its regard for the supercomputing market.
“Pretty much everyone’s abandoned (HPC) above enterprise, right?” Freund said. “I mean, Lenovo and Dell have put up the good fight, and they’ve built a profitable business on the backs of the of the enterprise. But once you get above that the investments required are just out of reach, the hardware margins are squeezed.”
Snell said Intel and Nvidia are both happy to compete in the HPC gear market while mostly staying away from complete systems (notwithstanding Intel as prime on the Aurora exascale system and Nvidia’s DGX AI supercomputer line).
“There has been that trend with both Intel and Nvidia trying to capture more of the (components) value stream of the systems in general,” Snell said. “And remember, we’re coming from a cluster era, the early Beowulf clusters treated x86 processors as commodity components where Intel didn’t really capture a great deal of the value of the system overall. Intel has been expanding its value envelope as has Nvidia, so it’s more of the value of a complete system is going to the processor vendors. By that I mean CPUs, accelerators, all the processing vendors, as well as the interconnect vendors. And now also you’d have to include NVMe storage vendors, of which Intel is a leader again. There’s less margin available for the system provider, your HPE, Dell or Lenovo.”
And IBM.
Freund, a former IBMer, said IBM has wrestled with issues around hardware vs. software for decades, and now intelligent, hybrid cloud has entering the strategic fray.
“The primary design of the (supercomputer) is for running large, synchronized workloads,” he said. “And it’s expensive business. I mean, I remember when I was at IBM doing POWER5, we had serious, lengthy discussions at the highest level of the company is whether we should just abandon HPC then.”
In the view of industry analyst Steve Conway, Senior Adviser, HPC Market Dynamics at Hyperion Research, IBM’s move makes good sense and reflects the changing and growing role HPC and AI within public clouds.
“For IBM, it’s not an abandonment of HPC, it’s HPC in a different context,” Conway said. “Twenty percent of all HPC jobs are now run in third party cloud environments, and the cloud services providers have been rushing to get better at HPC for two reasons. One, the HPC market is now is no longer a rounding error in the server market, it’s now big enough to go after even for large companies. It was about a $28 billion market in 2019…, so it’s a big enough market.
“But the other thing that’s very important,” Conway continued, “and it’s been especially important in IBM’s view, is that already HPC is indispensable, it’s at the forefront of AI for big emerging use cases, like healthcare and automated driving and smart cities and Internet of Things. If you want to know what’s happening at the most advanced stages that’s probably going to affect the mainstream AI market — which is still an early market — then look to HPC. So it’s no accident that the cloud services providers are making a transition from hiring people who know a whole lot about cloud computing and are trying to learn about HPC. Now, the CSPs are hiring HPC experts and teaching them about cloud.”
Citing former Cray CTO Steve Scott’s move to Microsoft Azure as VP of Hardware Architecture and former Cray VP Barry Bolding’s move to Amazon Web Services as director of HPC, Conway said “that’s very deliberate, because in order to get to the next stage of growth in HPC cloud computing they really have to understand the HPC problems.”
And HPC in the cloud increasingly means AI, forming the impetus behind IBM’s split.
IBM’s perceived pulling away from supercomputer systems has implications for Nvidia, which has a growing presence on the Top500 list and used it high-speed NVLink interconnect to tie its GPUs with Big Blue CPUs in Summit and Sierra, among other IBM HPC systems. Its GPUs are also integrated with CPUs in servers from Dell, Lenovo, HPE, Supermicro and other systems vendors competing in the high-end data center, enterprise HPC and AI server markets.
“Most of the heavy lifting is done by GPUs, if you look at Summit and Sierra, it’s estimated that 95 percent of the FLOPS of those systems are delivered by the GPUs. That’s where all the margin’s gone. I mean, Nvidia doesn’t drop their prices for anybody. They’ve kept their prices relatively high, their margins very healthy.”
So for Nvidia, the less-than-hoped-for market traction of the IBM POWER GPU-CPU integration poses a problem, particularly, Freund said, now that IBM “seems to be less and less serious about HPC.”
“Nvidia needs a high-speed link to a CPU,” Freund said. “They don’t have that except for with IBM. So if IBM is seen as not competitive, or not serious, or perhaps is not willing to invest in HPC, then that forces Nvidia to do something like go buy Arm. They have to have a CPU in the future.”
In fact, it’s the combination of AMD’s EPYC CPUs and Radeo GPUs will power the HPE Cray Frontier and El Capitan exascale systems.
“You know, back in the days of Opteron (the AMD data center server CPU launched in 2003 and discontinued in 2017), everybody said they wanted to integrate everything onto a CPU cache coherent interconnect, that was going to be the future,” Freund said. “Well, it never really happened. I think this time it will happen. I think that’s what drove AMD’s success in next-generation exascales. So what does Nvidia do? They’ve decided to build a data center-class, Arm-based CPU. They didn’t have to buy Arm to do it but that’s the way they decided to do it.”
As IBM embraces AI and hybrid cloud while spinning off its legacy IT business, Snell commented that the company has taken another step in shedding what he said is an outdated view of IBM as a mainstream technology synthesizer – while also noting that Big Blue may have been too innovative for its own good.
“I give IBM a lot of credit for being visionary,” Snell said, “and not only in high performance computing but in enterprise computing in general. Remember that IBM was on analytics and ‘Let’s Build a Smarter Planet’ five years before big data really took off in this space. And they had Watson competing on ‘Jeopardy’ five years before AI really took off in this space. They were ahead on hybrid clouds. They were ahead on flash, they’re ahead on quantum today. They’ve been out in front of every major enterprise trend, but they’ve struggled by the time any of it comes to market. It’s as if they’ve been better at selling what they will have 10 years from now than what they actually have today.”
No doubt IBM and Krishna view the company’s strategic shift to hybrid cloud and AI as a course correction to growing, high-margin markets that will make good use of Big Blue’s R&D prowess.
“I think IBM (has been) so focused on things that are truly transformative, the home run, on the really revolutionary play in enterprise, that it draws them away from what the majority of buyers are ready to implement today,” Snell said. “If you’re not a dreamer who’s ready to really shake things up and do things the way of the future, it’s like IBM hasn’t been ready to talk to you about something incremental… I think they’ve been too far in front. I mean, they’ve been trying to sell quantum harder than they’re trying to sell HPC right now, and quantum just isn’t a ready market.”
There is a long-standing notion in tech about being able to identify the pioneers from the arrows in their back. First movers are often not winners. (Netapp won the NAS market, not Auspex.) IBM hasn’t been a marketing and sales machine for, what, four decades? It continues to hurt them.