In this podcast, the Radio Free HPC team looks at the new Department of Energy’s RFP for Exascale Computers.
Called CORAL-2, this Request for Proposal is for up to $1.8 billion and is completely separate from the $320 million allocated for the Exascale Computing Project in the FY 2018 budget. Those funds are mostly focused at application development and software technology for an exascale software stack.
These new systems represent the next generation in supercomputing and will be critical tools both for our nation’s scientists and for U.S. industry,” Secretary Perry said. “They will help ensure America’s continued leadership in the vital area of high performance computing, which is an essential element of our national security, prosperity, and competitiveness as a nation.”
The RFP is issued under the CORAL umbrella (Collaboration of Oak Ridge, Argonne, and Livermore). CORAL1 has already procured the following systems:
- Aurora at Argonne National Lab (target completion date in 2021)
- Summit at ORNL (2018 to 2023 timeframe)
- Sierra at LLNL (2018 to 2023 timeframe)
This RFP (CORAL2) is designed to get bids from vendors to build two and (potentially) three new exascale supercomputers. Each system is expected to cost between $400 – $600 million.
The new RFP calls for systems to be housed at:
- One will be at ORNL
- One at LLNL
- A possible third system at Argonne
Specifications:
- According to the RFP, baseline performance for each system should be at least 1300 Petaflops/sec.
- Power budget will go up to 60 Megawatts. Preferred power consumption for the system is 20-40 Megawatts.
- MTBF is requested to somewhere around 6 Days
As far as predictions go, Dan thinks one machine will go to IBM and the other will go to Intel. Rich thinks HPE will win one of the bids with an ARM-based system designed around The Machine memory-centric architecture. They have a wager, so listen in to find out where the smart money is.
Nvidia and HPE are unlikely as primes to be able to handle the interconnect and OS scaling issues, and application scaling knowledge. And even the financial investment issue. But HPE ARM, Machine memory centric an interesting wildcard. Problem is ARM not American invention.