News Targeted

Category: IT|Dec 17, 2023 | Author: Admin

AMD is the undisputed datacenter GPU performance champ - for now

Share on

There is nothing quite like great hardware to motivate people to create and tune software to take full advantage of it during a boom time. And as we have been saying since we first caught wind of the “Antares” Instinct MI300 family of GPUs, AMD is going to take Nvidia directly on in the generative AI market, and it is absolutely and unequivocally going to sell as many MI300A and MI300X GPU accelerators as it can make.

Nvidia should be thankful for the great fortune – and when we are talking about Big Green in the datacenter, we really do mean a great fortune – that demand for compute engines in the generative AI market is growing faster than the collective group of accelerators can supply. Because if this were not the case, and supplies of these complex and elegant devices were relatively unlimited and not being overpriced by voracious demand, then AMD would absolutely be stealing deals away from Nvidia with the GPUs and platforms it has put together based on the MI300 family.

As it turns out, such intense competition will be delayed as demand will probably exceed supply for AI training and heavy inference accelerators for the next one or two years. And the good news about that is that the supply shortages will allow all kinds of accelerator architectures to get some sort of footing – directly proportional to the large language models and tools supported and the number of accelerators that suppliers can get manufactured – that they might otherwise have not had.

When supplies normalize and when all of the LLM and DLRM models have been tuned up by legions of software engineers for all kinds of devices, then there is going to be a shakeout and very likely a price war. In other words, the AI hardware market will return to something that resembles normal.

Until then, we all get to admire the hot rod iron that the accelerator chip makers are forging in their garages and setting loose for street racing at the hyperscalers, cloud builders, and HPC centers of the world, technology that eventually will trickle down to enterprises when it becomes more available and a hell of a lot cheaper per unit of work.

THE PACKAGE IS MORE IMPORTANT THAN THE CHIP

What is fun for system architects and AI software developers right now, as the MI300s are being launched by AMD at its Advancing AI event in San Jose just down the road from Nvidia, Intel, and many AI accelerator makers, is that based on the raw feeds and speeds, the Demon MI300 systems built by Team Lisa Su can win by a nose and sometimes even more than that compared to the Hellcat H100 systems built by Team Jensen Huang.

Those Demon and Hellcat names refer to the top end Dodge Challenger muscle cars that The Next Platform is partial to – those are not AMD or Nvidia code names. (We know that Brad McCreadie, who used to run IBM’s Power processor development for many years and who moved to AMD five years ago to revamp its Instinct GPU designs and win exascale deals, loves Challengers and challenges.)

Depending on how you rank memory bandwidth alongside of raw compute, the Demon is the MI300X with 192 GB of HBM memory, 5.3 TB/sec of memory bandwidth, and 163.4 teraflops of FP64 matrix math in 750 watts, is the Demon. The new Nvidia H200, with 141 GB of HBM3e memory, 4.8 TB/sec of memory bandwidth, somewhere between 1.6X and 1.9X more effective FP64 matrix performance on real workloads than the H100, 66.9 teraflops of FP64 oomph in 700 watts, is the Hellcat.

The AMD MI300A with 128 GB of HBM3 memory, 5.3 TB/sec of bandwidth, 122.6 teraflops of FP64 matrix math, in 760 watts would be the Challenger SRT, and the original Nvidia H100 with either 80 GB or 96 GB of HBM3, 3.35 GB/sec of memory bandwidth, 66.9 teraflops of FP64 matrix math in 700 watts would be the Challenger R/T Turbo in this extended analogy. We have little doubt that when Nvidia puts its “Blackwell” datacenter GPUs into the field next year, it will take the lead for a bit, but AMD is in the garage right now, too, forging the MI400.

Nvidia and AMD are going to push each other as much as customers are doing right now – and from here on out. And both are going to have market shares directly proportional to what they can ship out of the factories, and both are going to charge the same price per unit of work (more or less) for their GPUs depending on what metric above customers favor.

We are going to approach the MI300 announcement as we normally do, with a three pronged attack. First, in this article we will talk generally about the new AMD GPU accelerators and how they compare to prior generations of AMD devices. Then we will do a deep dive on the architecture of the MI300 devices, and then we will follow that up separately with a deeper competitive analysis between the AMD MI300A and the Nvidia GH200 Grace-Hopper hybrid and between the AMD MI300X and the Nvidia H100 and H200 accelerators. (It will take time to gather such information.)

As we already knew from the revelations that AMD made back in June, there are two versions of the MI300, with the MI300X being a GPU-only accelerator that is aimed at heavy AI training and inference for foundation models and the MI300A being a hybrid device mixing Epyc CPUs and Instinct GPUs on a single package and literally sharing the same HBM memory space, thus providing efficiencies for codes that still have some reliance on running serial code on the CPUs and flipping results over to the GPUs for more parallel processing.

Back then, we saw this note from the performance specs AMD was touting about the MI300 versus the MI250X, and as it turns out, this note sent us down a rathole that compelled us to overstate the performance of the MI300A as well as its power consumption. The note reads as follows:

“Measurements conducted by AMD Performance Labs as of Jun 7, 2022 on the current specification for the AMD Instinct™ MI300 APU (850W) accelerator designed with AMD CDNA™ 3 5nm FinFET process technology, projected to result in 2,507 TFLOPS estimated delivered FP8 with structured sparsity floating-point performance.”

That 850 watt statement made us think that AMD was overclocking the MI300A GPUs to get the same or slightly better performance on vector and matrix math than the MI300X, which presumably would run slower. This is not an unreasonable thing to think, given that the MI300A is going first into the “El Capitan” 2+ exaflops supercomputer at Lawrence Livermore National Laboratories and that this and other exascale “Shasta” Cray EX machines from Hewlett Packard Enterprise are water-cooled.

As it turns out, the GPUs on the MI300A are not overclocked at all, and moreover, they don’t have slightly less memory bandwidth as their HBM capacity went down because there are not fewer memory stacks active (as happened with the first generation Hopper H100 GPUs) but rather shorter HBM memory stacks on the MI300A compared to the MI300X – eight chips per stack on the former, and twelve chips per stack on the latter, to be precise. So the MI300A turns out to be a lower than we had hoped on the compute performance and a little better on the memory bandwidth performance.

We would have still like to have seen El Capitan turbo charged with some overclocking. . . . Why not? This is the whole point of supercomputing, after all. That said, the effective performance of these chips on certain bandwidth bound workloads will be driven as much by memory as they are by raw compute.

Here is how the “Antares” MI300X and MI300A stack up against the prior AMD Instinct datacenter GPUs:

We are still waiting on the clock speed for the MI300X and MI300A GPU motors and their HBM3 memory, but we think these are in the right ballpark.

What is immediately obvious here is just how far AMD has come since the days of the MI25 and MI50, when Nvidia just completely and utterly controlled GPU compute in the datacenter. MI100 was a step in the right direction, MI200 was a tipping point, and MI300 is AMD coming into its own and bringing all kinds of technology to bear.

What is not obvious from the table is how AMD’s chiplet approach and chiplet packaging comes to full fruition, using a mix of 2.5D chip interconnect with CoWoS interposers from Taiwan Semiconductor Manufacturing Co to interlink the MI300 compute complexes to HBM memory and 3D stacking of compute tiles – both CPU and GPU – atop Infinity Cache 3D vertical cache and I/O dies. This is 2.5D plus 3D packaging, what AMD calls 3.5D packaging.

Comments:

< Next article

Previous article >

Chinese botnet infects 260,000 SOHO routers, IP cameras with malware

Category: IT|Sep 19, 2024 | Author: Admin

HaLow Wi-Fi has now been tested at 9.9 miles — new Wi-Fi world record is a near 5X increase over previous best

Category: IT|Sep 18, 2024 | Author: Admin

Windows vulnerability abused braille “spaces” in zero-day attacks

Category: Microsoft|Sep 17, 2024 | Author: Admin

Important steps to take on your iPhone before installing Apple's latest iOS 18 to avoid any errors

Category: Apple|Sep 16, 2024 | Author: Admin

AMD hides Taiwan branding on Ryzen CPU packaging as it preps new chips for China market release

Category: IT|Sep 15, 2024 | Author: Admin

Contabo downtime analysis

Category: IT|Sep 14, 2024 | Author: Admin

Netflix will no longer provide support for iPhones and iPads running iOS 16

Category: IT|Sep 13, 2024 | Author: Admin

Google searches now link to the Internet Archive

Category: General|Sep 12, 2024 | Author: Admin

Apple ordered to pay back its illegal $14.4 billion Irish tax break

Category: Apple|Sep 11, 2024 | Author: Admin

Microsoft to start force-upgrading Windows 22H2 systems next month

Category: Microsoft|Sep 10, 2024 | Author: Admin

Mozilla extends Firefox support on unsupported Windows versions to March 2025

Category: IT|Sep 9, 2024 | Author: Admin

Apache fixes critical OFBiz remote code execution vulnerability

Category: IT|Sep 8, 2024 | Author: Admin

SonicWall SSLVPN access control flaw is now exploited in attacks

Category: IT|Sep 7, 2024 | Author: Admin

Microsoft Office 2024 to disable ActiveX controls by default

Category: Microsoft|Sep 6, 2024 | Author: Admin

LiteSpeed Cache bug exposes 6 million WordPress sites to takeover attacks

Category: IT|Sep 5, 2024 | Author: Admin