Sponsored Post
This technology guide, “insideHPC Guide to Composable Disaggregated Infrastructure (CDI) Clusters,” will show how the Silicon Mechanics Miranda CDI Cluster™ reference architecture can be a CDI solution blueprint, ideal for tailoring to specific enterprise or other organizational needs and technical issues.
The guide is for a technical person, especially those who might be a sys admin in financial services, life sciences or a similarly compute-intensive field. This individual may be tasked with making CDI work with a realizable ROI or just finding a way to extend the value of their IT investment while still meeting their computing needs.
Technology Use Case Examples
There are several use case areas for which CDI is appropriate. As composable architecture continues to evolve, a variety of technology areas, across a vast range of industries, stand to benefit, i.e. AI, HPC, accelerated data analytics, etc. These workloads greatly benefit from CDI deployment.
It is best practice to keep such systems on-premises. On-premises compute is more cost-effective than cloud-based compute when highly utilized. It’s also important to keep primary storage close to on-premises compute resources to maximize network bandwidth while limiting latency. It’s possible to leverage a range of networking options however typical recommendations are high-speed fabrics like 100 gigabit Ethernet or HDR 200Gb/s InfiniBand.
Another important consideration is that the size of the data set is just as important as the quality of the model, so allowance for a modern AI-focused storage architecture should be a priority. Traditional storage like NAS often can’t keep pace. Bandwidth is limited to around 10 gigabits per second and it’s not scalable enough for AI workloads. Similarly, the workaround of fast local storage doesn’t work for modern parallel problems because it results in constantly copying data in and out of nodes which congests the network.
AI optimized storage should be parallel and support a single namespace data lake. This enables the storage to deliver large data sets to compute nodes for model training. AI optimized storage must also support high bandwidth fabrics like 100 gigabit Ethernet or HDR 200Gb/s InfiniBand. A good storage solution should also enable object storage tiering to remain cost effective and to serve as an affordable long-term, scalable storage option for regulatory retention requirements.
Two common challenges seen today involve both networking and data as compute power has increased. Many organizations are generating data faster than ever before and ensuring both throughput and uptime over the network is key. The network can intelligently make decisions that route around issues and optimize data flow between endpoints making networks smarter than ever before. Along with higher speed networking comes higher performing storage solutions that can provide high throughput by leveraging NVMe SSDs as the primary tier of storage while still coupling with spinning disks for long term data retention.
Optimal utilization of infrastructure, especially GPUs, is more possible now than ever before. For many use cases like AI and HPC workloads, performance is still top priority and on-premises hardware will always provide peak performance with the ability to burst to the cloud on an as needed basis and with a powerful CDI infrastructure it’s possible to provide the same level of compute to employees at home that was previously only available in the office or data center.
CDI and the Enterprise Infrastructure of the Future
CDI is critical to the enterprise infrastructure of the future for many reasons. In this section, we’ll drill down into each along with supporting details for why they matter.
Scalability
Given increasing business requirements, accelerating collection of data, and the dynamic nature of today’s applications—IT and database administrators are facing difficulty in scoping their future infrastructure needs. This is especially true as enterprises prepare their infrastructure to manage massive, and potentially uneven, AI and HPC workloads.
Some solutions can be limited to specific compute, storage, and network configurations. This can create bstacles when additional resources are required for specific application but can’t be provisioned on demand. Such obstacles are principally eliminated with a composable infrastructure built with the future in mind.
Administrators are able to dynamically and quickly configure and provision everything from bare metal servers and network resources, to FPGAs and GPUs, to entire racks of equipment to adapt to the need for scalability.
Cloud computing as a concept is coming into its own maturity level where enterprises are discovering the dividing line between what it’s good for and what it’s not. There are always going to be certain processes that don’t work well in a cloud environment.
Most enterprises are not going to build a top performing supercomputer. Rather their infrastructure demands simply require incrementally more than what the cloud can provide, at least in a cost effective way. CDI is the best solution for the use cases that are too complex or too high performance for the cloud.
Often, outside of proof-of-concept deployments and bursting, the cloud is not cost effective for HPC or AI. In fact, ROI for the cloud is non-existent for many enterprises.
Utilization rates drive ROI. Enterprises often have performance limitations in the cloud, compared to on- premises solutions where higher utilization generates higher ROI. Lower utilization, on the other hand, favors the cloud. In fact, for the cloud, the higher the utilization rate, the less cost effective it is. For example, a cloud deployment that needs 24-hour operation of a system is going to be very expensive under a pay-per- use model, whereas a 4-hour operation is much more cost effective.
CDI has the flexibility to imitate a cloud-like infrastructure and be more cost effective over time. Additionally, when considering other issues such as data locality, vendor lock-in, and cost of data ingress and egress – the cloud will be expensive. In contrast, with on-premises solutions, these issues are not prevalent and it’s possible to be more agnostic and change directions easier than is possible with cloud deployment. This is one of the most important arguments in support of CDI versus the cloud.
Flexibility
Composable infrastructure is all about flexibility, but there’s nothing flexible about locking in an enterprise to a single server vendor’s technology offering, and then limiting the capabilities of the infrastructure from both a provisioning and fabric perspective. CDI provides enterprise IT with the ability to use the equipment vendors of their choice.
Composable cluster solutions are vendor and fabric agnostic in that they do not necessitate any drivers, agents, or software modules on the compute nodes themselves. They manipulate resources across bare metal compute nodes through CDI software so an enterprise can run any higher-order applications on any hardware, using any fabric. Further, CDI provides the enterprise with the flexibility to leverage the equipment they choose, and then orchestrate that equipment to best fit business needs.
With the rise of AI, there is a need to leverage many identical compute nodes backed by a parallel file system like Lustre or General Parallel File System (GPFS), especially in the research space where multiple researchers may combine budgets to purchase a cluster. This presents a design challenge as workloads are becoming both more complex and more diverse. For those who have experienced this, it has likely led to purchases of heterogeneous node types or homogeneous nodes packed with components that try to achieve a line of best fit.
The problem with this approach is that ROI is reduced as money is spent on equipment that isn’t needed for all jobs. For example, not every workload benefits from GPU acceleration. This leads to a cluster that is not configured to run all jobs optimally. It can also be very difficult to manage a heterogeneous cluster and support diverse workloads. CDI technology is a proven solution for solving these issues.
Dynamic Provisioning
Another major advantage that CDI affords is rapid dynamic provisioning. When an enterprise requires a software application that is not integrated into its current infrastructure, IT generally must allocate staff resources to add additional storage, reconfigure servers, or change the networking. A composable solution, on the other hand, adapts the provisioning of those physical computing, storage and network resources through management software to fit the needs of relevant applications—quickly making the goal of software defined infrastructure a reality.
With CDI, enterprises can right-size configurations as workload requirements evolve, bringing together resources and then reassigning resources in response to changing application and business requirements. In this way, resources grow into elastic building blocks for delivering optimal environments that are provisioned and configured to support a specific workload without having to wait for lengthy IT allocation processes.
Composable infrastructure offers benefits for streamlining and accelerating IT deployment in virtually every category of business. Instead of over provisioning infrastructure to meet IT needs, pools of resources like compute, storage and network are automatically composed in near real-time.
Effective resource utilization is critical for an enterprise to grow sustainably. Composable platforms are key to advancing this benefit. As an enterprise migrates from a static infrastructure to a dynamic infrastructure based on CDI, the resource utilization benefits become apparent. In a typical enterprise, resource utilization can experience a 2-3x gain with CDI. This improvement immediately translates into business and bottom-line benefits.
Resource Pools
Due to its ability to disaggregate hardware components into resource pools, composable infrastructure can deploy a heterogeneous cluster and assign resources on-demand for specific jobs. This provides the flexibility to dynamically provision bare metal instances to run jobs on best-fit hardware by abstracting the physical compute, storage and network hardware to make them available as services that can be accessed as needed. Furthermore, a CDI solution goes above and beyond these classes of resources to include the ability to compose services from pools of CPU, GPU, FPGA, NVMe, and NICs regardless of the type of underlying fabric. When an application no longer requires the resources, they’re returned to the resource pool and become ready for use by other applications.
Over the next few weeks we will explore these topics:
- Introduction, Benefits of CDI Clusters to the Enterprise
- Technology Use Case Examples, CDI and the Enterprise Infrastructure of the Future
- Key Ingredients of a High-performance CDI Cluster, Conclusion
Download the complete insideHPC Guide to Composable Disaggregated Infrastructure (CDI) Clusters courtesy of Silicon Mechanics.