Podcast: Slingshotting to Exascale

Print Friendly, PDF & Email


 
In this podcast, the Radio Free HPC team looks at the Cray Slingshot interconnect that will power all three of the first Exascale supercomputers in the US.

We quickly get to the main topic of the day, an examination of HPE/Cray’s Slingshot interconnect. It’s Ethernet on HPC steroids and will be the interconnect of choice for their upcoming slate of  systems. Slingshot includes a bunch of HPC enhancements while maintaining compatibility with existing Ethernet devices and protocols. Cray has designed a new Ethernet superset of features that includes smaller headers, support for smaller message sizes, plus other features aimed at cutting Ethernet latency and improving performance on HPC-oriented interconnect tasks. At the heart of this new interconnect is their innovative 64 port switch that provides a maximum of 200 Gb/s per port and can support Cray’s enhanced Ethernet along with standard Ethernet message passing. It also has advanced congestion control and quality of service modes that ensure that each job gets their right amount of bandwidth.

The architecture can scale to an astounding 279,040 endpoints, which is, as we note, “a lot of endpoints.” We also kick around the possibility that HPE/Cray might sell the interconnect as a standalone for use with competitive gear.

Cray Slingshot Interconnect

As mentioned on the call, the chips on this switch run so hot that they need liquid cooling – a first for interconnect processors. We also discuss the rising heat load coming from new CPUs and particularly ASICs and how network design can greatly impact costs. Listen to the show to learn about more, it’s a good and meaty discussion.

Other highlights:

Henry’s latest reason why we need to abandon the internet cracks us all up. What’s so funny? It’s that the Phillips smart lightbulbs need a firmware upgrade in order to prevent miscreants from pwning your entire network. No kidding, it’s true. And hilarious. Here’s the link. This has Henry thinking about how to protect his new home from war flying drones. He’s looking into drone killing home-based air defense systems or perhaps a whole-home Faraday cage.

Henry:  Another security related story, this time about low level exploits in the Cisco Discovery Protocol (CDP) that can expose tens of millions of devices to internet troublemakers. This is highly disturbing since there is so much Cisco gear out there and the fix relies on users updating their firmware to plug the holes. Ouch.

Jessi:  Brings athletics into the podcast, which is the cause of some banter about how totally un-athletic the rest of us are (with the exception of Jessi, of course). Nike is using big time computation to 3D print their new uppers to give athletes the ultimate advantage in shoe performance.

Shahin:  Alerts us to a comprehensive review of AMD’s Ryzen Threadripper 3990X, the first CPU in the world to sport 64 cores. This CPU is currently the top of AMD’s line and is just another signpost signaling AMD’s resurgence. Welcome back, AMD.

Dan:  As we covered in a prior episode, Microsoft had the fantastic idea of forcing their corporate Office 365 users to have Microsoft’s Bing installed as their default search engine, using an update to accomplish this task. Well, the users have spoken and their voice was heard loud and clear in Redmond. The company is retreating from their forced ‘upgrade’ to Bing and back pedaling with all due speed.

Download the MP3 Follow RFHPC on Twitter Subscribe on Spotify Subscribe on Google Play Subscribe on iTunes RSS Feed

Sign up for the insideHPC Newsletter

Comments

  1. David Barkai says

    To Shahin’s comment on AMD’s first ever 64-core CPU..maybe first first for exactly 64 cores, but see Ampere’s announcement of 80-core ARM CPU.. https://arstechnica.com/information-technology/2020/03/amperes-altra-is-80-arm-cores-of-cloud-native-power-efficient-cpu/