Dean Takahashi
2024-05-30 08:00:00
venturebeat.com
Time’s almost up! There’s only one week left to request an invite to The AI Impact Tour on June 5th. Don’t miss out on this incredible opportunity to explore various methods for auditing AI models. Find out how you can attend here.
A group of data center tech leaders have formed the Ultra Accelerator Link Promoter Group to create a new way to scale up AI systems in data centers.
Advanced Micro Devices, Broadcom, Cisco, Google, Hewlett Packard Enterprise (HPE), Intel, Meta, and Microsoft today announced they have aligned to develop a new industry standard dedicated to advancing high-speed and low latency communication for scale-up AI systems linking in data centers.
Called the Ultra Accelerator Link (UALink), this initial group will define and establish an open industry
standard that will enable AI accelerators to communicate more effectively. By creating an interconnect
based upon open standards, UALink will enable system OEMs, IT professionals and system integrators to
create a pathway for easier integration, greater flexibility and scalability of their AI-connected data centers.
“The work being done by the companies in UALink to create an open, high performance and scalable
accelerator fabric is critical for the future of AI,” said Forrest Norrod, general manager of the Data Center Solutions Group at AMD, in a statement. “Together, we bring extensive experience in creating large scale AI and high-performance computing solutions that are based on open-standards, efficiency and robust ecosystem support. AMD is committed to contributing our expertise, technologies and capabilities to the group as well as other open industry efforts to advance all aspects of AI technology and solidify an open AI ecosystem.”
June 5th: The AI Audit in NYC
Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.
The Promoter Group companies bring extensive experience creating large-scale AI and HPC solutions based on open standards, efficiency and robust ecosystem support. Notably, AI chip leader Nvidia is not part of the group.
“Broadcom is proud to be one of the founding members of the UALink Consortium, building upon our long-term commitment to increase large-scale AI technology implementation into data centers. It is critical to support an open ecosystem collaboration to enable scale-up networks with a variety of high-speed and low-latency solutions,” said Jas Tremblay, VP of the Data Center Solutions Group at Broadcom.
Scaling for AI workloads
As the demand for AI compute grows, it is critical to have a robust, low-latency and efficient scale-up
network that can easily add computing resources to a single instance. The group said Creating an open, industry standard specification for scale-up capabilities will help to establish an open and high-performance environment for AI workloads, providing the highest performance possible.
The group said this is where UALink and an industry specification becomes critical to standardize the interface for AI and Machine Learning, HPC, and Cloud applications for the next generation of AI data centers and implementations. The group will develop a specification to define a high-speed, low-latency interconnect for scale-up communications between accelerators and switches in AI computing pods.
The 1.0 specification will enable the connection of up to 1,024 accelerators within an AI computing pod
and allow for direct loads and stores between the memory attached to accelerators, such as GPUs, in the
pod.
The UALink Promoter Group will immediately start forming the UALink Consortium, expected to be official in Q3 of 2024. The 1.0 specification is expected to be available in Q3 of 2024 and made available to companies that join the Ultra Accelerator Link I(UALink) Consortium.
About Ultra Accelerator Link
Ultra Accelerator Link (UALink) is a high-speed accelerator interconnect technology that advances next generation AI/ML cluster performance. AMD, Broadcom, Cisco, Google, HPE, Intel, Meta, and Microsoft
are forming an open industry standard body to develop technical specifications that facilitate breakthrough performance for emerging usage models while supporting an open ecosystem for data
center accelerators.
“Ultra-high performance interconnects are becoming increasingly important as AI workloads continue to
grow in size and scope. Together, we are committed to developing the UALink which will be a scalable and open solution available to help overcome some of the challenges with building AI supercomputers,” said Martin Lund, EVP of the Common Hardware Group at Cisco, in a statement.