insideHPC Guide to NVIDIA EGX Platform: One Architecture for Every Workload – Part 2

Print Friendly, PDF & Email

Sponsored Post

Introduction

The massive growth of data requires that organizations analyze their data center infrastructure to meet changing workload needs. Artificial Intelligence (AI), High Performance Computing (HPC), big data analytics and professional visualization are growth drivers which are changing workloads. AI is foundational and is  transforming the way businesses work across many industries. But the use of AI will become critical as new  tools and technologies such as 5G wireless hyperconnectivity, edge computing, and Radio Area Networks  (RANs) become a reality. According to Carl Flygare, NVIDIA Pro GPU Product Marketing Manager, PNY  Technologies, “Organizations need to reimagine data centers to meet AI and big data analytics intensive  workload needs, sophisticated visualization and rendering workflows, and prepare for a GPU accelerated VDI enabled mobile workforce.”

This technology guide, “insideHPC Guide to NVIDIA EGX Platform: One Architecture for Every Workload,” sponsored by PNY describes how the NVIDIA® EGX™ Platform provides a single architecture to meet every data center workload need. PNY aids in GPU selection and testing allowing partners to equip customers with the  best NVIDIA EGX Platform solution based on their needs.

NVIDIA EGX Platform Hardware Meets Multiple Workload Needs

The NVIDIA EGX Platform delivers IT infrastructure that’s compatible with every vendor and major DevOps  tools. Traditional and modern, data-intensive applications can run side-by-side, accelerated and secure on the same infrastructure. Data centers running NVIDIA EGX Platform and NVIDIA Tensor core data center GPU or  NVIDIA Ampere-based Architecture cards make AI available everywhere. Remote workers experience full  NVIDIA GPU benefits and technologies with VDI compatible vGPU software regardless of the system they are using.

NVIDIA EGX Platform Architecture

The NVIDIA EGX Platform consists of multiple tiers as shown in Figure 2. The hardware level sits at the lowest level and includes data center GPUs, NVIDIA RTX™ GPUs (Ampere-based architecture RTX A6000 or RTX A5000), NVIDIA ConnectX® Smart Network Interface Cards (NICs) and BlueField® Data Processing Units (DPUs) inside NVIDIA-Certified servers from various PNY and NVIDIA OEM partners. These servers have been tested and validated to provide excellent performance for a wide range of GPU-accelerated workloads.

The next level includes third-party hypervisor products and the NVIDIA Virtual GPU (vGPU) Manager. The  vGPU works with the hypervisor to provide management, monitoring and optimization for GPU utilization;  the hypervisor does the same for the CPU. The NVIDIA Virtual GPU level supports NVIDIA virtual applications including Virtual Apps (vApps), Virtual PC (vPC), and RTX Virtual Workstation (vWS). The top level of the  architecture includes App virtualization and delivery applications.

NVIDIA EGX Platform Hardware Customized for Data Center or Visualization

NVIDIA RTX GPUs are essential to provide the NVIDIA EGX Platform features described in this paper. NVIDIA’s Smart Network Interface Cards (NICs) and BlueField series Data Processing Units (DPUs) connect the NVIDIA EGX Platform into any preexisting environment. Advanced NVIDIA networking technology enables 5G or Wi- Fi 6 deployment over emerging RANs or long-haul between sites using optical transducer technology. In  addition, the NVIDIA EGX Platform can also be implemented as bare metal to create the ultimate workstation  using between two through ten GPUs for the most data, compute, and graphic intensive tasks. The actual  hardware is delivered from industry certified partners working in conjunction with PNY and NVIDIA.

The NVIDIA EGX Platform hardware includes NVIDIA Professional GPUs (data center or professional visualization), NVIDIA’s EGX software stack, NVIDIA networking ConnectX SmartNICs, and BlueField DPUs which are housed inside third-party NVIDIA-Certified Systems. The platform can meet different needs depending on the GPU board installed in the server. NVIDIA GPU cards that are suited for data center and  visualization workloads include the NVIDIA A100 80GB, A30, A40, and A2, along with the pro-viz NVIDIA RTX  A6000 and RTX A5000 boards. Figure 3 shows the NVIDIA hardware and software solutions and industries  where the solutions might be used.

NVIDIA Data Center GPUs

The NVIDIA A100 80GB delivers unprecedented acceleration for AI and HPC workloads. The A100 can  efficiently scale up or be partitioned into seven isolated GPU instances, with Multi-Instance GPU (MIG) providing a unified platform that enables elastic data centers to dynamically adjust to shifting workload demands, while delivering guaranteed QoS (Quality of Service)

The NVIDIA A30 Tensor Core GPU is a versatile platform for mainstream enterprise workloads used to support HPC, AI inference, training, and data analytics. With TF32 and FP64 Tensor Core support, as well as an end-to- end software and hardware solution stack, A30 ensures that mainstream AI training and HPC applications can  be rapidly addressed. In many ways it is essentially a cost reduced alternative to the NVIDIA A100 80GB.

The NVIDIA A40 GPU is an ideal data center GPU for visual computing and can handle the most demanding  visual computing data center workloads. Built on the NVIDIA Ampere architecture, the A40 combines the  latest generation RT Cores, Tensor Cores, and CUDA Cores with 48GB of graphics memory for unprecedented  graphics, rendering, compute, and AI performance.

The NVIDIA A16 GPU is the best choice for implementing a Virtual Desktop Infrastructure (VDI) infrastructure  within an organization and taking remote work to the next level. Based on the latest NVIDIA Ampere  architecture, A16 is purpose-built to achieve the highest user density, with up to 64 concurrent users per  board in a dual slot form factor. Combined with NVIDIA Virtual PC (vPC) software, it enables the power and  performance to tackle any project from anywhere.

The NVIDIA A2 GPU is ideal for IVA (Intelligent Video Analytics) use cases, edge deployment, and compatibility with a wide array of installed base servers due to its low profile form factor, PCIe Gen 4 x8  system interface, and configurable power requirements (40 – 60W).

NVIDIA Professional Visualization GPUs

The NVIDIA RTX A6000 GPU built on NVIDIA Ampere architecture is the most powerful compute graphics board built by NVIDIA. The RTX A6000 combines 84 second-generation RT Cores, 336 third-generation Tensor Cores, and 10,752 CUDA cores with 48 GB of graphics memory for unprecedented rendering, AI, graphics, and compute performance.

The NVIDIA RTX A5000 GPU is a powerful solution built on the NVIDIA Ampere architecture. It combines 64  second-generation RT Cores, 256 third-generation Tensor Cores, and 8,192 CUDA cores with 24 GB of  graphics memory to supercharge rendering, AI, graphics, and compute tasks.

Both the RTX A5000 and RTX A6000 are actively cooled with an onboard fan designed for installation in  workstations, professional PCs, or server enclosures that accept actively cooled GPU cards.

NVIDIA EGX Platform Enables Virtualization

Hardware and software performance when running the NVIDIA EGX Platform for virtualization use cases is indistinguishable from running on a physical workstation. Employees doing remote work have improved  utilization of infrastructure and increased data security. NVIDIA RTX Virtual Workstation (vWS) software is the appropriate software choice for professional designers or AEC, DCC, Manufacturing, M&E, or scientific and  technical professionals using NVIDIA A40, RTX A6000 or RTX A5000-based servers.

Over the next few weeks we’ll explore these topics:

Download the complete insideHPC Guide to NVIDIA EGX Platform: One Architecture for Every Workload courtesy of PNY