Fully configured Aurora supercomputer goes online
The Aurora supercomputer at Argonne National Laboratory is now fully equipped with all 10,624 compute blades, boasting 63,744 Intel Data Center GPU Max Series and 21,248 Intel Xeon CPU Max Series processors.
Jeff McVeigh, Intel corporate vice president and general manager of the Super Compute Group, announced:
“Aurora is the first deployment of Intel’s Max Series GPU, the biggest Xeon Max CPU-based system, and the largest GPU cluster in the world. We’re proud to be part of this historical system and excited for the groundbreaking AI, science, and engineering Aurora will enable.”
What is the Aurora supercomputer?
The Aurora supercomputer is a collaboration between Intel, Hewlett Packard Enterprise (HPE), and the Department of Energy (DOE). Its goal is to enable simulations, data analytics, and artificial intelligence (AI) on an extremely large scale. With over 1,024 storage nodes (using DAOS, Intel’s distributed asynchronous object storage), Aurora provides 220 terabytes (TB) of capacity at 31TBs of total bandwidth, and uses the HPE Slingshot high-performance fabric. Later this year, Aurora is expected to be the world’s first supercomputer to achieve a theoretical peak performance of more than 2 exaflops (an exaflop is 1018 or a billion billion operations per second) when it enters the TOP500 list.
What will the Aurora Supercomputer be used for?
Aurora will use the Intel Max Series GPU and CPU product family to deliver maximum power. These products have been designed to meet the demands of dynamic and emerging HPC and AI workloads. Early results with the Max Series GPUs demonstrate leading performance on real-world science and engineering workloads, showing up to 2x the performance of AMD MI250X GPUs on OpenMC, and near-linear scaling up to hundreds of nodes. The Intel Xeon Max Series CPU provides a 40% performance advantage over the competition in many real-world HPC workloads, such as earth systems modeling, energy, and manufacturing. Researchers can use Aurora to tackle monumental challenges, such as climate change and finding cures for deadly diseases. Aurora provides the necessary tools to push the boundaries of scientific exploration.
Rick Stevens, Argonne National Laboratory associate laboratory director, said:
“While we work toward acceptance testing, we’re going to be using Aurora to train some large-scale open source generative AI models for science. Aurora, with over 60,000 Intel Max GPUs, a very fast I/O system, and an all-solid-state mass storage system, is the perfect environment to train these models.”
The complexity of the Aurora Supercomputer
Aurora’s rectangular blades contain processors, memory, networking, and cooling technologies. Each blade has two Intel Xeon Max Series CPUs and six Intel Max Series GPUs. The Xeon Max Series product family is already showing great early performance on Sunspot, which has the same architecture as Aurora. Developers are using oneAPI and AI tools to speed up HPC and AI workloads and improve code portability across multiple architectures.
Installing these blades has been a delicate process. Each 70-pound blade requires specialized machinery to be integrated vertically into Aurora’s refrigerator-sized racks. The system’s 166 racks can hold 64 blades each and span eight rows, taking up space equivalent to two professional basketball courts in the Argonne Leadership Computing Facility (ALCF) data center.
What’s next for the Aurora Supercomputer
Researchers from the ALCF’s Aurora Early Science Program (ESP) and DOE’s Exascale Computing Project will transfer their work from the Sunspot test bed to the fully installed Aurora. This transition will allow them to scale their applications on the full system. Early users will test the supercomputer and identify potential bugs that need to be fixed before deployment. This includes efforts to develop generative AI models for science, recently announced at the ISC’23 conference.
Image Credits
In-Article Image Credits
Aurora supercomputer at Argonne National Laboratory via Intel with usage type - News Release MediaFeatured Image Credit
Aurora supercomputer at Argonne National Laboratory via Intel with usage type - News Release Media