Intel-powered Aurora Supercomputer Ranks Fastest for AI

Intel-powered Aurora Supercomputer Ranks Fastest for AI

At ISC High Performance 2024, Intel, in collaboration with Argonne National Laboratory and Hewlett Packard Enterprise (HPE), announced that the Aurora supercomputer has broken the exascale barrier at 1.012 exaflops and is the world's fastest AI system according to Open Science. Dedicated to AI for acquisition. 10.6 AI exaflops. Intel will also detail the critical role of open ecosystems in driving AI-accelerated high-performance computing (HPC). “The Aurora supercomputer will surpass the exascale, allowing it to pave the way for tomorrow's discoveries,” said Ogi Brkic, Intel's vice president and general manager of data center AI solutions, from understanding weather patterns to the mysteries of the universe. Until unlocked, supercomputers act as a compass that guides us toward solving the really hard scientific challenges that can improve humanity.

Designed as an AI-centric system from its inception, Aurora will allow researchers to use creative AI models to accelerate scientific discovery. Argonne's early AI-powered research has made significant progress. Success stories include mapping the human brain's 80 billion neurons, high-energy particle physics through deep learning, and accelerating drug design and discovery through machine learning. The Aurora supercomputer is a massive system with 166 racks, 10,624 compute blades, 21,248 Intel Xeon CPU Max Series processors, and 63,744 Intel Data Center GPU Max Series units, making it one of the largest GPU clusters in the world.

Aurora also features the largest open, Ethernet-based supercomputing interconnect on a single system of 84,992 HPE Slingshot Fabric endpoints. The Aurora supercomputer came in second on the high-performance LINPACK (HPL) benchmark but broke the exascale barrier with 9,234 nodes at 1.012 exaflops, using only 87% of the system. The Aurora supercomputer also took third place on the High Performance Conjugate Gradient (HPCG) benchmark with 5,612 TeraFLOPS per second (TF/s) with 39% of the machine. The purpose of this benchmark is to evaluate more realistic scenarios providing insight into communication and memory access patterns, which are important factors in real-world HPC applications. It complements benchmarks such as LINPACK by providing a comprehensive view of system capabilities.

At the heart of the Aurora supercomputer is the Intel Data Center GPU Max series. The Intel Xe GPU architecture is the foundation of the Max series, with specialized hardware such as matrix and vector compute blocks optimized for both AI and HPC tasks. The Intel Xe architecture design that delivers unprecedented compute performance is why the Aurora supercomputer took the top spot in the high-performance LINPACK-mixed precision (HPL-MxP) benchmark – a benchmark for AI workloads in HPC. Highlights the importance in the best way.

The parallel processing capabilities of the Xe architecture excel in handling the complex matrix-vector operations inherent in neural network AI computations. These compute cores are critical to accelerating critical matrix operations for deep learning models. Complete with Intel's suite of software tools, including the Intel oneAPI DPC++/C++ compiler, a rich set of performance libraries, and optimized AI frameworks and tools, the Xe architecture fosters an open ecosystem for developers that features Flexibility and extensibility in different devices. Form factors

In a special session at ISC 2024, Tuesday, May 14 at 6:45 p.m., (GMT+2) Hall 4, Congress Center Hamburg, Germany, Andrew Richards, CEO of CodePlay, an Intel company, discussed the increasing speed of Will focus on demand. Computing and software in HPC and AI. He will highlight the importance of OneAPI in offering a unified programming model across diverse architectures. Built on open standards, OneAPI empowers developers to produce code that runs seamlessly across different hardware platforms without extensive modification or vendor lock-in. It's also the goal of the Linux Foundation's Unified Acceleration Foundation (UXL), in which Arm, Google, Intel, Qualcomm and others are creating an open ecosystem for accelerators united on open standards to break proprietary lock-in. Contrast Compute. The UXL Foundation is adding more members to its growing coalition.

Meanwhile, the Intel Tiber Developer Cloud is expanding its compute capacity with new state-of-the-art hardware platforms and new service capabilities, allowing enterprises and developers to explore the latest Intel architecture, AI models and accelerate workloads. to innovate and improve from, and then deploy AI models at scale. New hardware includes Intel Xeon 6 E-core and P-core systems for select customers and previews of large-scale Intel Gaudi 2-based and Intel Data Center GPU Max series-based clusters. New capabilities include cloud-native AI training and inference workloads and Intel Kubernetes Service for multi-user accounts.

The new supercomputers being deployed with Intel Xeon CPU Max Series and Intel Data Center GPU Max Series technologies underline Intel's goal to advance HPC and AI. Systems include the Euro-Mediterranean Center on Climate Change (CMCC) Cassandra to accelerate climate change modelling. Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA) CRESCO 8 for achievements in fusion energy; The Texas Advanced Computing Center (TACC), which is in full production to enable data analysis in biology, supersonic turbulence flow and atomistic simulations on a wide range of materials. as well as the United Kingdom Atomic Energy Authority (UKAEA) to address memory-related issues that underpin the design of future fusion power plants.

The result of the mixed-precision AI benchmark will be the basis for Intel's next-generation GPU for AI and HPC, codenamed Falcon Shores. Falcon Shores will leverage next-generation Intel Xe architecture with the best of Intel Gaudi. This integration enables a unified programming interface.

Preliminary performance results on Intel Xeon 6 with P-cores and Multiplexer Combined Ranks (MCR) memory provide a 2.3x performance improvement for real-world HPC applications with 8800 mega transfers per second (MT/s). such as the Nucleus for European Modeling of the Ocean (NEMO), when compared to the previous generation, establishes a strong foundation as the preferred host CPU choice for HPC solutions.

About the Author

Leave a Reply