Department of Defense Contracts Worth $52 million Provide Powerful Validation for the Performance Optimization Delivered by Composable Disaggregated Infrastructure
John Spiers, Chief Strategy Officer, Liqid
Modern high-performance computing (HPC) is undergoing a transformation. Exponential data growth and new demands on infrastructure have IT departments looking for new ways to deliver accelerated time-to-value for data scientists, while improving their own agility and efficiency. The artificial intelligence and machine learning (AI+ML) applications required to analyze these data are themselves data intensive and have uneven hardware requirements depending on what phase of the AI workflow is underway. Data ingest, for example, is NVMe storage-centric, while inference relies heavily on GPU accelerators capable of denser parallel processing.
As workloads and their resource requirements rapidly expand, organizations are looking for ways to achieve better hardware optimization, increasingly struggling to manage demand for data center resources while minimizing the power usage associated with that demand. Traditional three-to-five year sales cycles don’t help, because, again, AI is changing the HPC game. As these applications have evolved, once predictable workloads have diversified well beyond the applications in heavy use just a few years ago. This means data center resource requirements that were once fairly predictable can now literally change overnight, taxing even the most well-planned hyperconverged environment while introducing new difficulties in terms of scaling to meet performance requirements.
Tightly bundled hardware stacks associated with hyperconvergence make it difficult to share accelerator technologies more widely across the data center, meaning that some resources are overtaxed while others are under-utilized. The architectural limitations also mean that disaggregated accelerators such as GPU, FPGA, NVMe, storage-class memory, and intelligent networking are difficult to integrate into these environments on an ad-hoc basis when performance needs change.
Composable disaggregated infrastructure (CDI) solutions are emerging as a solution to these roadblocks to advancing the mission of high-performance computing. CDI orchestration software dynamically composes GPUs, NVMe SSDs, FPGA, networking, and storage-class memory to create software-defined bare metal servers on demand. This enables unparalleled resource utilization to deliver previously impossible performance for AI-driven data analytics.
With composable infrastructure, data scientists can compose the exact amount of required GPU performance directly into the platform, in tandem with other accelerators, and fully support AI workload requirements as needed. Users gain a number of other previously unavailable advantages, such as the ability to dynamically orchestrate any CPU-to-GPU ratio, and incorporate other accelerators across PCI-Express Gen 4.0, Infiniband, and Ethernet fabrics for a perfectly balanced system.
Additionally, the data agility that composable disaggregated infrastructure makes available does not come at the cost of performance. Ultra-fastGPUs and composable NVMe storage can be aggregated and deployed via software without regard to physical limitations, and shared across intelligent fabrics in the exact ratios required for a given workload. This means that data operations such as NVMe, and even, GPU-over-Fabric (NVME/GPU-oF) can be done with the same efficiency as those that take place up and down the hardware stack for industry-leading performance with the tightest possible physical footprint.
Offering the most powerful validation yet for the value of composable infrastructure, my company Liqid recently secured contracts to provide the Department of Defense with three of the world’s most powerful supercomputers at two US Corps of Engineers R&D sites. Collectively worth more than $52 million, Liqid beat out powerful tech industry incumbents to provide the three supercomputing systems, and the adaptive performance that composable infrastructure delivers was key to the DoD’s decision to go with Liqid.
The systems collectively represent 32 petaflops of performance and GPU and other accelerator resources can be quickly added to or removed from compute systems for unprecedented flexibility and agility. At this time, the performance capabilities of the Liqid deployment at the US Army Corps of Engineers Engineer Research and Development Center in Vicksburg, Mississippi would rank the system at number 15 on the TOP500 ranking of the world’s most powerful high-performance computing (HPC) platforms.
As supercomputing continues to evolve, composable infrastructure solutions are increasingly the architecture of choice for data scientists seeking to accelerate research and development activities and better optimize their hardware footprint. By providing unprecedented flexibility for resource deployment and management, composable infrastructure is poised to go mainstream in HPC facilities as the performance needs of AI exceed the limitations of traditional physical infrastructure. With AI+ML now table stakes for supercomputing facilities, composable infrastructure offers the industry’s most compelling solution to meet exponentially increasing need for top data performance and hardware efficiency to solve some of the world’s most urgent problems.