AMD Strix Point “Ryzen AI 9 365” APU Benchmarks Revealed Zen 5's IPC, Latency, Throughput & Various Performance Aspects

AMD Strix Point “Ryzen AI 9 365” APU Benchmarks Revealed Zen 5's IPC, Latency, Throughput & Various Performance Aspects

of AMD Strix Point “Ryzen AI 9 365” Zen 5 APU allegedly tested by David Huangwhich has in-depth analysis of IPC, latency and performance.

The AMD Ryzen AI 9 365 “Strix Point” APU is tested in several benchmarks before launch, detailing Zen 5's IPC, throughput, latency and more.

Note – David Huang's blog states that the numbers mentioned here are based on engineering samples of AMD Strix Point APUs, specifically the Ryzen AI 9 365, so take them with a grain of salt as they may not be representative of the final product. . He also clearly states that the test system was running unofficial system firmware/software.

Image source: David Huang's blog

To begin with, David got access to an early AMD Strix Point laptop that reportedly features the Ryzen AI 9 365 SKU. The test platform used LPDDR5x-7500 memory in 32 GB capacities. Today's test focuses on IPC and Throughput starting with the instruction rate tool for three generations of Zen CPUs, including the Zen 3, Zen 4, and Zen 5 architectures.

David's list states that while the Zen 5 has improvements thanks to its ground-up design, there are also some downsides to the architecture, which are listed below.

  • The throughput of various scalar ALU instructions has been greatly increased, but because the number of vector units in mobile Zen 5 is halved compared to desktop and server, SIMD throughput in this test is no better than Zen 4. Does not change. Even on the Zen 5. Core with half the vector units, SIMD store operations of all widths are still doubled compared to the previous generation, and SIMD load store throughput reaches 1:1.
  • Branch processing capacity has been greatly increased, increasing the number of untaken branches that can be processed per cycle from two to three; And two taken branches can be processed per cycle.. This should be related to the new front-end design.
  • The latency of 128/256/512bit SSE/AVX/AVX512 SIMD integer addition calculations has been increased to 2 cycles. This change may be to facilitate maintaining higher frequencies.
  • Compared to Zen 4, the throughput of 128/256bit SIMD integer addition operations is halved, but 512bit is unchanged. It is speculated that this issue only exists on Zen 5 cores with SIMD half, which may be related to port allocation.
  • Removed the nop fusion feature introduced in Zen 4. It is now not possible to combine the nop instruction with another instruction in the same macro.
  • Adjusted the throughput of some logical register operations, combined the throughput of some mov operations and reduced some zeroing operations to register 5, a mixed improvement over Zen 4.

The tests also focus on the parallel dual-pipe frontend, which should affect instruction fetching, decoding, and macro op-cache. It has been reported that by running different lengths and numbers of NOP instructions, the difference between Zen 4 and Zen 5 can be seen. The observations conclude as follows:

  • Zen 5 uses a multi-frontend design similar to Tremont but wider, using two 4-wide x86 decoders and at least 8-wide macro-op caches to implement 8-wide names.
  • Consider the following trend.
    • Zen 5 cannot exceed the x86 decode bandwidth by 4 when running consecutive NOP instructions in a single thread.
    • In the instruction throughput section, it was tested that two taken branches can be processed in a single cycle.
  • It is reasonable to assume that Zen 5 does not use a pre-decode ILD cache solution like Gracemont, but allows two decoders to work simultaneously when the branch predictor predicts a taken branch. should, i.e. directly let one of the decoders start decoding. From the target address of the next branch. From this perspective, AMD still needs to rely on macro op cache to achieve high throughput in scenarios with sparse branches.
  • Zen 5 not only supports decoding x86 instructions from two locations in a single cycle, but macro-op cache in a single cycle to achieve two fetched branches per cycle within macro-op cache. Supports receiving instructions from two locations. cache
  • When the core runs two SMT threads, each can monopolize the decoder so that the x86 decoding throughput limit of the entire core reaches 8 in most cases.

The tests then go into more aspects of the performance of the AMD Strix Point APUs. Once again, the Ryzen AI 9 365 chip is used but this time, it's pitted against the Ryzen 7 7735U (Zen 3), Ryzen 7 7840U (Zen 4), and the aforementioned Ryzen AI 9 365 (Zen 5). , but this time the cores available on both the Zen 5 and Zen 5C chips are being tested. The Zen 5C cores only run at a very low clock of 3.30 GHz while the Zen 5 cores and the other two chips are set at a fixed clock rate of 4.8 GHz.

Performance was evaluated within SPEC CPU 2017 and Geekbench 6 (single-core and multi-core). In SPEC CPU 2017, the AMD Zen 5 chip sees a +9.71% increase over the Zen 4 offering and a 22.28% increase over the Zen 3 offering. The Zen 5C cores are almost identical to the Zen 4 IPC at a lower clock.


In Geekbench 6, the relative performance improvement over Zen 3 is up to 40.94% compared to Zen 3 and Zen 4, which is about 13.1%. These numbers are in single cover only. With the multi-core tests, the Zen 5 “Strix Point” APUs show a 55.45% improvement over the Zen 3 and a 24.3% improvement over the Zen 4, but it should be noted that the Zen 3 and Zen 4 chips have a TDP of 28W compared to 54W. were running of the Ryzen AI 9 365 APU.

SPEC CPU 2017 IPC (Gen-to-Gen)

  • Zen 3 – 100.00%
  • Zen 4 – 111.46%
  • Zen 5 – 109.71%

SPEC CPU 2017 Perf (Relative)

  • Zen 3 – 100.00%
  • Zen 4 – 111.46%
  • Zen 5 – 122.28%

Geekbench 6 ST IPC (Gen-to-Gen)

  • Zen 3 – 100.00%
  • Zen 4 – 117.37%
  • Zen 5 – 115.28%

Geekbench 6 ST Perf (Relative)

  • Zen 3 – 100.00%
  • Zen 4 – 124.71%
  • Zen 5 – 140.94%

David's blog post goes into the various architectural aspects of the Zen5 architecture extensively. Which will not only power the Ryzen AI 300 “Strix Point” APUs but also many different CPUs such as The Ryzen 9000 “Granite Ridge” desktop familythe 5th Gen EPYC “Turin” server family and various other APUs for desktop and laptop platforms.

What we do know officially is that the Zen 5 cores come with an average IPC improvement of 16% for different workloads, so again, we'll let our readers pinch those results. Would advise to take with salt. The first launch of Zen 5 is expected in mid-July with Strix APUs. Ryzen 9000 high-performance desktop chips in late July So stay tuned for more information.

News Source: David Huang

Share this story.



About the Author

Leave a Reply