In this special guest feature, Dr. Rosemary Francis gives her perspective on what to look for at SC19 conference next week in Denver.
Will the cloud vendors turn up again and what toys have they been building this year to accommodate the massive niche that is HPC? Has Amazon FSX for Lustre been well received? What will they announce this year?
There are always many questions circling the HPC market in the run up to Supercomputing. In 2019, the focus is even more focused on the cloud in previous years. Here are a few of the topics that could occupy your coffee queue conversations in Denver this year.
Arm and AWS
Arm has yet to conquer the HPC market, but their slow burn this year comes with the real possibility of running workloads on AWS’s custom Arm offering: A1. While I can’t see many supercomputing workloads flocking to run on Arm, cloud-based workloads demand more attention to efficiency, which means that before long Aarch64 has to start to look like an attractive platform for some parallel workloads. I’m looking forward to finding out if anyone is using them yet.
Do data scientists care?
The HPC community is very keen to incorporate all the exciting new machine learning workloads in their offerings, but do the data scientists care? While sharing a lot of the same technologies and challenges as traditional HPC, artificial intelligence workloads don’t necessarily have the same legacy complexity or need for super-fast storage as the HPC industry.
Instead, intelligent machine learning-optimized database solutions and a ton of object store seems to be working out well for most workloads. It will be interesting to see how data scientists approach vendors’ new offerings.
Hardware turns to software
With the move to hybrid cloud, plenty of the hardware vendors are turning their hand to software to grow their market share in the changing landscape. System monitoring is becoming more important as the complexity of the HPC platforms multiplies.
Last year we saw a few new offerings and I’ve no doubt that this year will see a few more. How many are truly sticky though – which tools have you used this year that you won’t want to go without?
On the hardware front, on-prem will obviously be the most efficient option for a long time and with that comes a deepening of the storage rivalry. The hardware, software configuration is sufficiently complex to make every HPC cluster unique, even before you have taken into account the compute or scheduler set up. This makes HPC a very difficult industry to enter for large and small players alike: a new solution can easily be scuppered by an unexpected scheduler option, an off-label use of the storage or by a kernel setting.
Simplification?
On that note, do we need fewer options in the industry before we see really good adoption of anything new? I can’t see the cloud simplifying anything any time soon. A lot of the customization of each HPC cluster is down to the differences in the workloads that are being run: it is not surprising that Singularity workloads orchestrated with Slurm against a Lustre file system will have different settings for MPI Geocodes, massive genome pipelines and sub-second EDA workloads.
Have you run a million workloads this year or this morning? Does the average job span a thousand machines or do you have 96 jobs on a single host? Do you run third party tools that have been customized for your needs or in-house codes that you can’t afford to change?
While the HPC industry remains diverse there will always be a need for a really collaborative relationship between customers and vendors, research and industry and that is what Supercomputing is all about. We look forward to seeing you there.
Dr. Rosemary Francis is CEO of Ellexus.