If a computer’s intelligence can be anthropomorphized, then an AI supercomputer that can scale to 26,000 GPUs (26 exaFLOPS AI throughput) is at the head of the class. That’s the case with Google’s new A3 GPU supercomputers for Google Cloud, introduced at the Google I/O 2023 conference.
Google said A3 GPU VMs are designed to speed up the training and inference of highly complex ML models for organizations building large language models, generative AI and diffusion models for optimizing operations.
A3 VMs combine NVIDIA H100 Tensor Core GPUs and Google advanced networking technology:
- A3 is the first GPU instance to use Google’s 200 Gbps IPUs, with GPU-to-GPU data transfers bypassing the CPU host and flowing over separate interfaces from other VM networks and data traffic. This enables up to 10x more network bandwidth compared to our A2 VMs, with low tail latencies and high bandwidth stability, Google said.
- Google’s Jupiter data center networking fabric scales to tens of thousands of interconnected GPUs and allows for full-bandwidth reconfigurable optical links that can adjust the topology on demand. Google said that for most workload structures, Jupiter achieves workload bandwidth indistinguishable from more expensive off-the-shelf non-blocking network fabrics, resulting in a lower TCO.
- The A3 supercomputer’s scale provides up to 26 exaFlops of AI performance, lowering the time and costs for training large ML models.
Google said A3 VMs also support inference workloads, “seeing up to a 30x inference performance boost when compared to our A2 VM’s that are powered by NVIDIA A100 Tensor Core GPU.”
The A3 follows Google’s recently announced G2 VMs, a cloud offering using NVIDIA L4 Tensor Core GPUs for serving generative AI workloads.
A3 features include:
- 8 H100 GPUs utilizing NVIDIA’s Hopper architecture, delivering 3x compute throughput
- 3.6 TB/s bisectional bandwidth between A3’s 8 GPUs via NVIDIA NVSwitch and NVLink 4.0
- 4th Gen Intel Xeon Scalable CPUs
- 2TB of host memory via 4800 MHz DDR5 DIMMs
- 10x greater networking bandwidth powered by hardware-enabled IPUs, specialized inter-server GPU communication stack and NCCL optimizations
Google also said that for customers looking to develop complex ML models without the maintenance, they can deploy A3 VMs on Vertex AI, an end-to-end platform for building ML models on managed infrastructure that’s built for low-latency serving and high-performance training.