Staying at the Cutting Edge of Automotive Design Technology with Azure Cloud Platform and AMD

Print Friendly, PDF & Email

[SPONSORED CONTENT]   In automotive, first movers have a massive advantage. Success in the industry centers on vehicle R&D, at the core of which is design, performance and crash safety multi-physics software. Generating remarkably true-to-life simulations, these applications quicken vehicle time-to-market, allowing engineers to sample more design options and conduct more virtual tests in less time.

But those remarkable simulations are highly compute- and data- intensive, drawing on high performance computing and advanced AI technology resources. In this first mover-rewarding industry, a major challenge is to avoid being a slow mover where technology is concerned. On-premises hardware life cycles typically last three to five years, and that’s a problem in automotive, where technology ages in dog years. A strategy for keeping the latest and greatest technology available to design engineers, says Microsoft, is moving workloads to the Azure high performance cloud, which perpetually refreshes its platform with advanced, HPC-class hardware and software.

We recently sat down with AMD’s Rick Knoechel, senior solution architect, public cloud, and Sean Kerr, senior manager, cloud solutions marketing. In their roles, they work closely with managers at Azure, which has a track record in recent years of being the first public cloud to adopt AMD chips, including the latest AMD EPYC CPUs and AMD Radeon Instinct GPUs.

According to Knoechel, automotive engineers work under a chronic source of pressure referred to as an “environment of constraint.” Too often, there aren’t enough compute resources available to run engineering workloads. And it’s no wonder, considering that automotive design software is evolving to encompass within a single application package multiple disciples, such as multi-physics simulations combining mechanical with aerodynamics with thermal cooling.

The constraint environment is having a significant impact on the end user community. “These are the men and women whose job it is to validate the ‘crashworthiness’ of a new vehicle,” Knoechel said. “Let’s say you have a new EV pickup truck laden with 2000 pounds of batteries. You’ve added 2000 pounds of mass to the truck and now you want to make sure it passes FMVSS (Federal Motor Vehicle Safety Standards) regulations. This kind of thing means demand for compute is at an all-time high in the industry.”

Demand for HPC is not only high, Knoechel said, it’s erratic.

AMD’s Rick Knoechel

“Historically, demand has been very ‘peaky,’” Knoechel said, “it’s not predictable… And therein lies this challenge. How much resources do you make available? If you’re ‘over-capacitized,’ you’re under-utilized, and you’ve wasted time, energy and money. And if you’re ‘under-capacitized,’ you can’t meet end user demand. And so right now, pretty much every on-prem HPC environment in the world is under-capacitized because there’s increased demand for HPC.”

He shared a customer anecdote involving an automotive component supplier with a 1,200-core on-prem infrastructure whose productivity gradually declined due to testing bottlenecks that forced design engineers to wait in queues for compute time.

“They had established a best practice that said before you could perform a final validation test … they would build a physical prototype … for FMVSS certification. But the rules were that you weren’t allowed to build the prototype until you had reliable results from the simulation. So in essence their whole product design and development pipeline had to run through 200 simulation engineers globally, and it created a bottleneck on the order of $6 million to $7 million a year in terms of staff time waiting in the queue.”

Fast forward: the company’s engineers now do their work on Azure and, according to Knoechel, they’re running four times more crash simulations and four times more software licenses with annual cloud expenditures of only about $1 million. And with Azure’s high availability and scalability – its “elasticity” – compute resources are delivered no matter the daily demand changes for HPC. The result, Knoechel said: “productivity has been unleashed.”

Knoechel pointed out other problems for automotive companies with on-prem resources. One is having data center managers on staff with expertise in running HPC-class infrastructures – they’re not easy to hire and retain. While that expertise is commonly found at the big automotive OEMs, it’s a growing problem among automotive component makers.

 

“More and more design work is being pushed down to the suppliers,” Knoechel said. “They have to be more innovative, they’re under pressure to deliver a better, lighter product, and time-to-market is a big challenge.” The result is that more automotive suppliers – some with as many as 5,000 cores – have flipped their infrastructures to the cloud.

Another problem for on-prem organizations: tech industry supply chain breakdowns delaying delivery of advanced servers with the latest components – the newest chips, power supplies, storage and memory drives. But because Azure is among the server OEM’s biggest customers, it’s at the head of the line for new server deliveries.

“Azure is very quick to adopt our latest technologies,” Kerr said. “For example, they were the first in the cloud to adopt our 3rd Gen ‘Milan’ EPYC with AMD 3D-V Cache, they’ve been very good at picking up our products when they come out. On-prem hardware procurement cycles can be long, they may do a refresh every three to five years, requiring big bets by planning teams. But with Azure the refresh is constant, customers don’t have to make that big bet.”

In addition, Azure’s HPC team has created an on prem-like offering, reflecting the subtleties of users’ high performance computing needs. Beyond incorporation new, advanced technologies and the storage and deployment tools used in HPC, Azure also has a lightweight hypervisor for bare metal-like performance and was the first major cloud provider to offer Infiniband fabric – all backed by Azure’s HPC technical staff.

Last March, Microsoft announced the upgrade of Azure HBv3 virtual machines with EPYC 3rd Gen chips. He said Microsoft’s internal tests show the new HBv3-series delivers up to 80 percent higher performance for CFD, 60 percent for EDA RTL (right-to-left), 50 percent higher performance for explicit finite element analysis and 19 percent for weather simulation. Much of that improvement is attributed to the 3D-V Cache in the new EPYC processors. In practical terms, an almost 80 percent runtime improvement on a 140 million-cell ANSYS Fluent CFD model means you can run two jobs in the time it took to do one.

Kerr and Knoechel added that an automotive OEM customer is seeing similar performance boosts running full-vehicle Ansys LS Dyna crash simulations with 20 to 30 million-plus elements. Those runs used to run 30 hours, now they take 10.

“If you’re in charge of safety for any major global automotive,” Knoechel said, “you’d better be paying attention to that.”