Microsoft Introduces Generative AI VM on Azure with Scaling up to Thousands of GPUs

Print Friendly, PDF & Email

Microsoft today introduced the ND H100 v5 VM on the Azure cloud, a virtual machine for development generative AI applications. The VM can scale from eight to thousands of NVIDIA H100 GPUs with Quantum-2 InfiniBand networking, Microsoft said, and the adoption of H100’s, NVIDIA’s latest data center GPUs, will accelerate performance for AI models over the previous VM, which used NVIDIA A100s.

Matt Vegas. principal product manager, Azure HPC+AI, said Microsoft is applying its “experience in supercomputing and supporting the largest AI training workloads to create AI infrastructure capable of massive performance at scale. The Microsoft Azure cloud, and specifically our graphics processing unit (GPU) accelerated VMs, provide the foundation for many generative AI advancements from both Microsoft and our customers.

In a blog posted today, Vegas added that “For Microsoft and organizations like Inflection, NVIDIA, and OpenAI (creator of the ChatGPT generative AI application) that have committed to large-scale deployments, this offering will enable a new class of large-scale AI models.”

Starting in 2019, Microsoft has completed three rounds of investments, totaling $13 billion, in OpenAI, the latest an investment of $10 billion announced in January.

“Co-designing supercomputers with Azure has been crucial for scaling our demanding AI training needs,” said Greg Brockman, president and co-founder of OpenAI, “making our research and alignment work on systems like ChatGPT possible.”

Vegas said the ND H100 v5 is available for preview and will become a standard offering in the Azure portfolios.

The ND H100 v5 includes:

  • 8x NVIDIA H100 Tensor Core GPUs interconnected via next gen NVSwitch and NVLink 4.0
  • 400 Gb/s NVIDIA Quantum-2 CX7 InfiniBand per GPU with 3.2Tb/s per VM in a non-blocking fat-tree network
  • NVSwitch and NVLink 4.0 with 3.6TB/s bisectional bandwidth between 8 local GPUs within each VM
  • 4th Gen Intel Xeon Scalable processors
  • PCIE Gen5 host to GPU interconnect with 64GB/s bandwidth per GPU
  • 16 Channels of 4800MHz DDR5 DIMMs

Vegas said Azure services such as Azure Machine Learning are designed to make Microsoft’s “AI supercomputer accessible to customers for model training and Azure OpenAI Service enables customers to tap into the power of large-scale generative AI models.”

“Our focus on conversational AI requires us to develop and train some of the most complex large language models,” said Mustafa Suleyman, CEO of machine learning startup Inflection, led by LinkedIn co-founder Reid Hoffman and founding DeepMind member Suleyman. “Azure’s AI infrastructure provides us with the necessary performance to efficiently process these models reliably at a huge scale. We are thrilled about the new VMs on Azure and the increased performance they will bring to our AI development efforts.”