Intel Gaudi 3 AI Accelerator Official: 5nm, 128 GB HBM2e, Up To 900W, 50% Faster Than NVIDIA H100 & 40% More Efficient

Intel Gaudi 3 AI Accelerator Official: 5nm, 128 GB HBM2e, Up To 900W, 50% Faster Than NVIDIA H100 & 40% More Efficient

Intel has finally revealed its next-generation AI accelerator, Car 3based on a 5nm process node and competing directly against NVIDIA's H100 GPUs.

Intel Gaudi 3 AI Accelerators battle it out for NVIDIA, delivering 50% faster AI performance on average while being 40% more efficient.

Intel's Gaudi AI accelerator has been a major competitor and The only alternative to NVIDIA's GPUs in the AI ​​segment. We've recently seen some hot benchmark comparisons between Gaudi 2 and NVIDIA A100/H100 GPUs with Intel showing its strong perf/$ lead while NVIDIA remains the overall AI leader in terms of performance. Now the third chapter of Intel's AI journey begins with its Gaudi 3 accelerator, which has been fully detailed.

Intel introduced the Intel Gaudi 3 AI Accelerator on April 9, 2024 at the Intel Vision event in Phoenix, Arizona. It is designed to bring global enterprises the choice for creative AI while maintaining the performance and scalability of its Gaudi 2 predecessor. (Credit: Intel Corporation)

The company announced the Gaudi 3 accelerator, which features the latest (5th gen) tensor core architecture with a total of 64 tensor cores packed into two compute dies. The GPU itself has a 96 MB cache pool shared across both dies and eight HBM sites, each containing an 8-hi stack of 16 Gb HBM2e DRAM for capacities up to 128 GB and 3.7 TB/s bandwidth. . The entire chip is fabricated using TSMC 5nm process node technology and has a total of 24 200GbE interconnect links.

In terms of product offerings, the Intel Gaudi 3 AI accelerators will come in both; Mezzanine OAM (HL-325L) form factor With 900W standard and over 900W liquid-cooled variants and PCIe AIC full-height, double-wide and 10.5” tall designs. Gaudi 3 HL-338 PCIe The cards will come in passive cooling and support up to 600W TDP with the same features as the OAM variant.

The company also announced its HLB-325 baseboard. HLFB-325L Integrated Subsystem Which can carry up to 8 Gaudi 3 accelerators. This system has a combined TDP of 7.6 kW and measures 19″.

A follow-up to Gaudi 3 will follow. Falcon Shores which is expected by 2025. and will combine both Gaudi and Xe IPs in a single GPU programming interface built around the Intel oneAPI specification.

News for the newspaper: At Intel Vision, Intel introduced the Intel Gaudi 3 AI Accelerator, which provides 4x AI compute for BF16, 1.5x increased memory bandwidth, and 2x networking bandwidth compared to its predecessor for massive system scale-out. Provides – a significant leap in efficiency and productivity. For AI training and inference on popular large language models (LLMs) and multimodal models.

The Intel Gaudi 3 Accelerator will meet these requirements and offer flexibility through open community-based software and open industry-standard Ethernet, helping businesses flexibly scale their AI systems and applications. .

How the custom architecture delivers GenAI efficiency and performance: The Intel Gaudi 3 accelerator, designed for massively efficient AI compute, is manufactured on a 5 nanometer (nm) process and offers significant advancements over its predecessor. It is designed to allow all engines to be activated in parallel—with the Matrix Multiplication Engine (MME), Tensor Processor Cores (TPCs) and Networking Interface Cards (NICs)—for fast, efficient deep learning. enabling the acceleration needed to compute and scale Key features include:

  • AI-dedicated compute engine: The Intel Gaudi 3 accelerator was purpose-built for high-performance, high-performance GenAI compute. Each accelerator uniquely features a heterogeneous compute engine that includes 64 AI-custom and programmable TPCs and eight MMEs. Each Intel Gaudi 3 MME is capable of performing an impressive 64,000 parallel operations, providing a high degree of computational efficiency, making them adept at handling complex matrix operations, a type of computation used in deep learning. is fundamental to the algorithm. This unique design accelerates the speed and efficiency of parallel AI operations and supports multiple data types, including FP8 and BF16.
  • Expanding memory for LLM capacity requirements: 128 gigabytes (GB) of HBMe2 memory capacity, 3.7 terabytes (TB) of memory bandwidth, and 96 megabytes (MB) of onboard static random access memory (SRAM) enough to process large GenAI datasets on low Intel Gaudi 3s Provides memory, particularly useful in serving large language and multimodal models, resulting in increased workload performance and data center cost efficiency.
  • Efficient System Scaling for Enterprise GenAI: Twenty-four 200 gigabit (Gb) Ethernet ports are integrated into each Intel Gaudi 3 accelerator, providing flexible and open standard networking. They enable efficient scaling to support large compute clusters and eliminate vendor lock-in from proprietary networking fabrics. The Intel Gaudi 3 Accelerator is designed to efficiently scale and scale out from a single node to thousands to meet the broad needs of GenAI models.
  • Open Industry Software for Developer Productivity: Intel Gaudi software integrates the PyTorch framework and provides an optimized Hugging Face community-based model – today's most common AI framework for GenAI developers. This allows GenAI developers to work at a high abstraction level for ease of use and productivity and model porting across hardware types.
  • Gaudi 3 PCIe: New to the product line is the Gaudi 3 Peripheral Component Interconnect Express (PCIe) add-in card. Designed to deliver high performance with low power, this new form factor is ideal for workloads such as fine-tuning, inference, and retrieval-augmented generation (RAG). It is equipped with a 600-watt full-height form factor with a memory capacity of 128 GB and a bandwidth of 3.7 TB/s.
Intel introduced the Gaudi 3 AI Accelerator on April 9, 2024 at the Intel Vision event in Phoenix, Arizona. The accelerator provides 4x AI compute for BF16 and a 1.5x increase in memory bandwidth compared to its predecessor. (Credit: Intel Corporation)

The Intel Gaudi 3 accelerator will deliver significant performance improvements for training and inference tasks on GenAI's leading models. In particular, Intel Gaudi 3 Accelerator is likely to deliver on average compared to NVIDIA H100:

  • 50% faster time to train Llama2 in 7B and 13B parameters, and GPT-3 in 175B parameter models.
  • 50% faster inference throughput And 40% higher inference power efficiency In Llama 7B and 70B parameters, and Falcon 180B parameter models. An even greater inference performance gain on long input and output sequences.
  • Guessing 30% faster Llama at 7B and 70B parameters, and against NVIDIA H200 at Falcon 180B parameter models.

About market adoption and availability: The Intel Gaudi 3 accelerator will be available to original equipment manufacturers (OEMs) in the second quarter of 2024 in universal baseboard and Open Accelerator Module (OAM) industry-standard configurations. Notable OEM adopters that will bring Gaudi 3 to market include Dell Technologies, HPE, Lenovo, and Supermicro. General availability of the Intel Gaudi 3 accelerator is expected for the third quarter of 2024, and the Intel Gaudi 3 PCIe add-in card is expected to be available in the last quarter of 2024.

Intel introduced the Intel Gaudi 3 AI Accelerator on April 9, 2024 at the Intel Vision event in Phoenix, Arizona. The AI ​​Accelerator is designed to break down proprietary walls to bring choice to the enterprise generative AI market. (Credit: Intel Corporation)

The Intel Gaudi 3 accelerator will also power several cost-effective cloud LLM infrastructures for training and assessment, offering cost-effectiveness benefits and choice to organizations that now include NAVER.

Share this story.



About the Author

Leave a Reply