AMD Instinct MI300A APU With CDNA 3 GPU, Zen 4 CPU & Unified Memory Offers Up To 4x Speedup Versus Discrete GPUs In HPC

AMD Instinct MI300A APU With CDNA 3 GPU, Zen 4 CPU & Unified Memory Offers Up To 4x Speedup Versus Discrete GPUs In HPC

of AMD Instinct MI300A APUs HPC provides significant performance improvements for workloads compared to traditional discrete GPUs.

Discrete GPUs aside, AMD's Instinct MI300A APUs are going to supercharge HPC workloads with up to 4x performance.

The AMD Instinct MI300A is a realization of the “Exascale APU” platform that was laid out years ago. The idea was to pack a high-performance GPU with a high-performance CPU on the same package that uses a unified memory pool. For HPC, these accelerator/co-processor designs provide high performance per watt gains but require a lot of porting, tuning, and maintaining applications with millions of lines of code that can be a bit complex. However, the researchers seem to have used two popular programming models, OpenMP and OpenACC, to fully exploit AMD's next-generation APU juggernaut.

For this research paper, the title “Porting HPC applications to the AMD Instinct MI300A using Unified Memory and OpenMP“, the OpenFOAM framework is used, which is an open source C++ library:

  • We provide a blueprint of the APU programming model and demonstrate the ease and flexibility of porting codes to the MI300A with OpenMP.
  • We describe our method for increased production acceleration and is widely used in industrial code—OpenFOAM.

Because the AMD Instinct MI300A accelerator uses a unified HBM interface, it eliminates the need for data duplication and eliminates the need for programming differences between host and device memory locations. Additionally, AMD's ROCm software suite provides additional optimizations that help integrate all parts of the APU into a cohesive and heterogeneous package. As a little recap on AMD's Instinct MI300A APUs:

  • The first integrated CPU+GPU package
  • Targeting the Exascale Supercomputer Market
  • AMD MI300A (Integrated CPU + GPU)
  • 153 billion transistors
  • 24 zine up to 4 cores
  • CDNA 3 GPU architecture
  • Up to 192 GB HBM3 memory
  • Up to 8 chiplets + 8 memory stacks (5nm + 6nm process)
Image source: Arxio

As a result, performance gets a huge benefit. In an evaluation using OpenFOAM's HPC Motorbike benchmark, the AMD Instinct MI300A APU was tested against the AMD Instinct MI210, NVIDIA A100 80 GB and NVIDIA H100 (80 GB) GPUs. AMD GPUs were running on ROCm 6.0 stack and NVIDIA GPUs were running on CUDA 12.2.2 stack. The benchmark was configured to run up to 20 time steps with the average execution time per time step (seconds) taken as the Figure of Merit (FOM). All three configurations except the Instinct MI300A were using a discrete CPU so a socketed CPU was configured with discrete memory management to allow the GPUs to address system memory and run the benchmark.

Coming to the tests, the results were normalized to the NVIDIA H100 system which offered the best discrete GPU performance among the three discrete chips but the Instinct MI300A APU ended up with a 4x increase over the NVIDIA H100 and a 5x increase over the Instinct MI210 accelerator. .

  • OndGPUs, more than 65% of the time is spent in page transfers: updating GPU tables and copying data between host and device.
  • On an APU, the shared physical memory between the CPU core and the GPU's compute units completely removes the page transfer overhead, resulting in a significant increase in performance.
Image source: Arxio

It was also discovered that the AMD Instinct MI300A with a single Zen 4 CPU package was twice as fast as a single socket Zen 4 CPU running with a discrete GPU solution. Overloading the MI300A APU with multiple processes leads to a further 2x performance improvement (test with 3-6 CPU cores per APU), which is much better than the lack of scalability on the dGPU+dCPU configuration.

As a result, it seems that the AMD Instinct MI300A APU's compute capabilities are going to be unmatched in the HPC segment. NVIDIA has moved away from traditional HPC performance in its next-generation Blackwell lineup as AI seems to be the big craze these days. And while AMD is going to fix this with it. MI300X Accelerator and its future freshnessit looks like the HPC segment will feature AMD.

News Source: Nicholas added

Share this story.



About the Author

Leave a Reply