Today Cray announced a new, open and extensible software platform to address the growing need for supercomputing across government and private industries. As advanced simulation, artificial intelligence (AI) and digital transformation create new, data intensive workloads, the need for performance at scale is growing rapidly. Recognizing the challenges presented by the exascale era, Cray’s software fuses supercomputing performance and capability with the modularity, composability and ease-of-use of cloud computing.
In related news today, Cray has been awarded a third U.S. exascale system contract. The system, dubbed “El Capitan,” will be sited at LLNL. Cray now has $1.5 billion in business for Shasta supercomputing systems and the new software platform.
The Shasta software used with the El Capitan system expands traditional supercomputing to support the complex workflows and numerous 3D studies necessary to unlock the full potential of exascale computing,” said Bill Goldstein, lab director at LLNL. “The flexibility and extensibility of El Capitan’s software and hardware environment will enable the NNSA laboratories to explore and develop capabilities that leverage the combination of AI and machine learning with modeling and simulation to accelerate time-to-solution for our national security codes. These technologies could apply equally well to multi-physics codes employed outside of the national security domain.”
Cray has a rich history of developing the most performant, scalable and reliable software in supercomputing. This is validated by the vast majority of global weather centers that rely on Cray to deliver time critical numerical weather forecasts. These weather centers are at the forefront of the convergence of HPC, AI and IoT workloads that operate at immense scale.
Cray’s new software platform improves performance and reliability by including new key capabilities:
- Extends traditional HPC batch workflow scheduling for modeling and simulation with new Kubernetes container orchestration to enable converged HPC and AI workflows
- Adds support for multi-tenancy between HPC and AI partitions and sub-partitioning within AI jobs to enable workflow isolation
- Provides highly resilient containerized services with separate compute and management planes to minimize planned and unplanned downtime
- Creates an open supercomputing platform by including standardized and supported APIs for integration, data access and software ecosystem extensibility and interoperability
- Delivers a new and fully integrated telemetry for the system as well as user application level monitoring to quickly correlate and remediate issues
As we enter the exascale era, modern applications are creating the need for applying supercomputing capability to a new class of digital transformation problems. What is the domain of a few national laboratories today is fast becoming a necessity for every enterprise,” said Peter Ungaro, president and CEO, Cray, Inc. “With our new software platform, Cray is delivering a fully featured, extensible software and tools environment that performs like a supercomputer and runs like a cloud. The same Cray technology that powers exascale systems can be delivered in a single, low-cost rack and ready to integrate into any data center environment.”
With this next generation supercomputing software platform, Cray has addressed key requirements:
- Application development and portability: Developers can easily compose converged modeling, simulation, analytics and AI workflows using modern microservices and a robust suite of tools and compilers that support a broad spectrum of processors and accelerators. Applications can easily move from laptop to supercomputer for maximum scalability of code with minimal refactoring.
- Management and monitoring: IT administrators employ a unified systems management and telemetry framework to ensure production reliability for job execution and support high system availability.
- Interoperability: IT managers and administrators can deploy Cray technology with the assurance that both current and future requirements are covered with broad data center systems and software interoperability with the addition of documented APIs for automation and data access, and support for industry standard protocols.
- Investment protection: CXOs and business owners benefit from a multi-user and multipurpose platform with support for heterogeneous processor architectures (x86, AMD, ARM, NVIDIA, FPGAs and other accelerators) and the scalability to meet their rapidly growing analytics and AI initiatives.
Shasta is designed to support extremely heterogeneous workloads not just from science and engineering, but also from the growing contingent of enterprises that acquire supercomputers to outcompete their rivals in the new era of digital transformation and AI,” said Steve Conway, COO and senior vice president of research at Hyperion Research. “Cray Shasta supercomputers aim to move leading enterprises beyond proof-of-concept to production, and to operate on premises or in the cloud. By integrating extremely heterogeneous requirements into the new Shasta system hardware and software, Cray has substantially expanded its addressable market to include enterprise analytics, AI and cloud computing.”
Cray’s new software platform will be available starting Q4 of 2019.