In this special guest feature, Dan Olds from OrionX continues his Epic HPC Road Trip series with a stop at LANL in New Mexico.
Gary Grider, Deputy Division Leader of the Los Alamos National Laboratory HPC Division, is one of the biggest voices in HPC. He’s been involved with supercomputing since 1981 and has attended an astounding 30 of 31 SC national conferences – quite a record.
I offered Gary a chance to ride in the car with me to Dallas for SC18, but he turned me down without much thought or any sugar coating. Our conversation started with the open question: where do you see HPC going? He’s not wild about the fact that today’s machines are still being designed for dense matrix type problems along the lines of LINPACK rather than the sparse matrix problems that are much more prevalent today. HPCG, as a benchmark for sparse matrix code, is a step in the right direction, but isn’t as irregular as real-world workloads.
At LANL, they’re trying to go the other way – trying to build machines differently so that they mate up with the existing applications. The Los Alamos workloads are unique in that the vast majority of their workloads are big simulations. And by big, he’s talking about a simulation that might take up half of a machine, with a petabyte of memory, and then run for six months.
We’re building machines how we know how to build machines and we’re trying to figure out how to make software run on them.”
They are one of the very few sites in the world that runs a system like this. The main takeaway from the experience of running this problem? “We’re 10x too small to run this type of problem.”
The big problem today for LANL is memory bandwidth. Today, they’re getting one tenth of a byte of memory bandwidth per flop, while past machines, like the Cray 1, got 24 bytes per flop – a huge difference. The technical term for this is “really crappy” according to Gary.
Latency is the next problem to attack. When you get a miss on an indirect reference, you take a latency hit and waste thousands of cycles on modern physics codes.
We live in a world where we need memory bandwidth more than we need more flops. We also live in a world where we need better latency more than we need flops.”
There are tricks to improving the realized memory bandwidth and latency and, according to Gary, they’re old tricks. He talks about a few of them in the video. We also talk about how AI and Machine Learning might play, or not play, a role in their workloads.
It’s a fascinating conversation and Gary doesn’t pull any punches in his judgement about today’s machines and what needs to happen in with future machines.
Many thanks go out to Cray for sponsoring this journey. We’ve hit 1,375 miles in our road trip, next stop is Sandia National Lab, located in Albuquerque, New Mexico.
Dan Olds is an Industry Analyst at OrionX.net. An authority on technology trends and customer sentiment, Dan Olds is a frequently quoted expert in industry and business publications such as The Wall Street Journal, Bloomberg News, Computerworld, eWeek, CIO, and PCWorld. In addition to server, storage, and network technologies, Dan closely follows the Big Data, Cloud, and HPC markets. He writes the HPC Blog on The Register, co-hosts the popular Radio Free HPC podcast, and is the go-to person for the coverage and analysis of the supercomputing industry’s Student Cluster Challenge.