Josh Norem
2024-06-26 13:45:00
www.extremetech.com
The AI arms race has been ongoing for the past year or so, with the world’s biggest tech companies attempting to scale up their AI operations as quickly as their budget allows. There now appears to be at least one company with a rather ambitious goal: an AI cluster sporting around 1.2 million GPUs. That’s over 20 times the typical number of GPUs found in today’s most powerful supercomputers, and it would represent a seismic shift in the size, power, and expense associated with next-generation AI training systems.
Word of this audacious project comes directly from an AMD executive in an extensive interview about the future of the data center with The Next Platform. The site interviewed Forrest Norrod, the VP and general manager of AMD’s data center business. In the interview, Norrod was asked what the biggest AI cluster any of its customers were considering was, and the interviewer randomly threw out the number of 1.2 million GPUs. Norrod replied by saying that the number was “in that range” while adding that he was talking about a single computer.
The MI300 family of data center accelerators have been the fastest ramp in AMD’s history, but it still has a long way to go in toppling Nvidia from its perch.
Credit: AMD
Norrod describes the theoretical project’s scope as “mind-blowing” while admitting it may or may not come to pass. He said that companies are contemplating spending tens of billions, and even a hundred billion, on future AI-related projects. For context, AMD’s Epyc-powered Frontier supercomputer is currently ranked #1 in the world by Top500.org. It was the first supercomputer to break the exascale barrier, costing just $600 million. It also has just 37,888 MI250X GPUs, so a computer with 1.2 million GPUs is practically unfathomable.
Norrod was also asked about its competition in the GPU space with Nvidia, noting that AMD has captured 30% of the CPU market in the data center. Would AMD be able to capture that much market share for GPUs as well? Norrod demurred, saying Nvidia is the incumbent, so its biggest priority is to minimize friction of adoption. That’s a tall order, given Nvidia’s dominance with not just hardware but its CUDA software as well. The interviewer then asked if AMD could just make a replica of Nvidia hardware to sell, to which Norrod replied, “We can’t quite do what you suggested.”