Karl Freund, Contributor
2025-07-11 13:23:00
www.forbes.com
Photo by Marijan Murat/picture alliance via Getty Images
While GPU performance has been the focus in data centers over the last few years, the performance of fabrics has become a key enabler or bottleneck in achieving the throughput and latency required to create and deliver artificial intelligence at scale. Nvidia’s prescient acquisition of Mellanox has been a critical component of its success over the last few years, enabling scalable AI and HPC performance. However, it’s not just scale-up (in-rack) performance and scale-out (rack-to-rack) connectivity; latencies in scale-within network-on-chip (NoC) have also become essential for achieving high AI throughput and improved response times.
The Importance of Advanced Fabric Solutions
The computing landscape has undergone significant changes with the advent of artificial intelligence, evolving from a loosely coupled network of independent computers to a highly integrated fabric of collaborating, accelerated computing nodes. Three levels of scale require such interconnects: the chip/chiplet, rack, and data center. Each compute element must share data with its neighbors and beyond over a low-latency, high-bandwidth communication channel to maximize performance and minimize latency.
Scale-Within Fabrics
On-chip fabrics connect processor cores, accelerators, and cache memory within a single or multi-chip module. As SoCs become more complex, integrating tens or even hundreds of cores or IP blocks, a single NoC often cannot provide the required bandwidth and scalability. Multiple NoCs, or subnetworks, are used to manage traffic between chiplets, each potentially optimized for specific data types or communication patterns. For example, one NoC might handle high-bandwidth data transfers between compute chiplets, while another manages control signals or memory access. As chiplet-based designs gain wider adoption, these NoCs become the bottleneck of chiplet-to-chiplet communication and data sharing.
A unified fabric significantly enhances latency and bandwidth in chiplet-based systems by streamlining communication across fragmented networks-on-chip (NoCs) and optimizing physical interconnects. Such a fabric can help minimize hops, improve routing, enable a higher degree of scaling, and manage congestion. More importantly, it can provide improvement in performance and reduce footprint and power through reuse of wires and logic, in a segment where every saving or every extra ounce of performance is treasured.
At the chip level, the networks on chips (SoCs) tend to be isolated; they are designed to connect a specific domain on the chip, which works great until you need to move data to another domain, creating a latency-inducing “hop” or carry overheads of working across different protocols. A unified network on chip (NoC), such as that provided by Baya Systems (a client of Cambrian-AI Research), provides a single transport mechanism for the various protocols for each fabric. Transport is separate from protocol layers, minimizing wires and logic in building a unified fabric that supports coherent, non-coherent, and custom protocols for maximum efficiency, lowest cost, and reduced power consumption.
The various on-chip networks tend to be distinct, but could be unified with the right technologies.
On-chip fabrics connect processor cores, accelerators, and cache memory within a single or multi-chip module. As SoCs become more complex, integrating tens or even hundreds of cores or IP blocks, a single NoC often cannot provide the required bandwidth and scalability. Multiple NoCs, or subnetworks, are used to manage traffic between chiplets, each potentially optimized for specific data types or communication patterns. For example, one NoC might handle high-bandwidth data transfers between compute chiplets, while another manages control signals or memory access. As chiplet-based designs gain wider adoption, these NoCs become the bottleneck of chiplet-to-chiplet communication and data sharing.
Scale-Up Fabrics
Scale-up fabrics connect accelerators (GPUs, AI processors) within a single rack or AI pod, prioritizing ultra-low latency and high bandwidth communication. Scaling up with NVLink has been the go-to standard, but the industry needs an open alternative, such as UALink, to interconnect accelerators from other vendors.
UALink and Ultra Ethernet solve different problems in the data center.
UALink is a memory-semantic interconnect standard led by the UALink Consortium, enabling accelerators to share memory directly. Its four-layer protocol stack supports single-stage switching, reducing latency and congestion. UALink will deliver up to 200 Gbps per lane and memory-sharing capabilities to scale (up) accelerator connectivity. The Consortium recently approved the V1.0 Specification of UALink in April 2025, and the first silicon is expected to be available later this year, with volume production scheduled for 2026.
Scale-Out Fabrics
Scale-out fabrics interconnect multiple racks or pods, enabling the distribution of workloads across larger clusters, or more often running a lot more “copies” of the workloads thereby increasing services to more clients. Nvidia offers both Ethernet and InfiniBand networking to connect racks for east-west traffic. As for scale-out alternatives, the industry is standardizing a high-bandwidth open networking protocol called Ultra Ethernet tailored for AI workloads across as many as 1 million heterogeneous nodes.
Ultra Ethernet IP solution will enable 1.6 Tbps of bandwidth for scaling (out) massive AI networks. UALink will deliver up to 200 Gbps per lane and memory-sharing capabilities to scale (up) accelerator connectivity.
Companies in the Fabric IP Business
Historically, fabrics have been proprietary and come from companies like Nvidia, AMD, and Intel. For Arm provides the CoreLink NIC-301 and related interconnect IP, widely used in Arm-based SoCs for scalable, configurable on-chip interconects. While Arm’s fabric is really designed for Arm CPU SoCs, Baya Systems and Arteris provide fabric IP for many implementations, including RISC-V and custom accelerators. And Baya is unique in its chiplet-first focus and the ability to scale out and scale up, while
Arteris
Arteris is recognized as a leader in providing what we have been referring to as Scale-Within fabric NoCs and SoC integration automation software to speed the development of complex SoCs. Arteris went public in October 2021 (Nasdaq: AIP), with a market cap of approximately $300 million as of mid-2025. Arteris has over 200 customers such as Samsung, AMD, Qualcomm, Baidu, Mobileye, and NXP, with an installed base of nearly four billion devices. Arteris IP is broadly deployed across the automotive segment (notably ADAS, with >70% market share), communications, consumer electronics, enterprise computing, and industrial markets.
Arteris’ products include the FlexNoC Interconnect with its integrated physical awareness technology, gives place and route teams a much better starting point while simultaneously reducing interconnect area and power consumption. Arteris claims that the FlexNoC delivers up to 5X shorter turn-around-time versus manual physical iterations. Ncore IP is similar, but is designed for multi-core cache-coherent designs.
Baya Systems
As we have noted, the AI transformation has driven the need for scale-up and scale out, and has also put a lot of demands on scale-within. In addition, the market perceived a gap emerging that wasn’t readily solved by off the shelf, scale-within IP. The market transition to chiplets which offer a promise of greater scale and cost effectiveness has different demands on a more agile data-driven design philosophy to handle the complexity of the new systems.
This is exactly what Baya Systems, a relatively new entrant aims to solve and it has been gaining a great deal of traction since it came out of stealth a year ago. Baya Systems (a client of Cambrian-AI Research) is a Silicon Valley startup with strong backing and leadership that has architected a semiconductor IP and software portfolio to enable designers of SoCs, systems, and data center scale infrastructure to build high-performance AI technology quickly and efficiently. Baya Systems chiplet-first fabrics are designed to address both on-chip, scale-up, and cross-system (scale-out) networking challenges. Its flexibility and modularity position it for broader applications, potentially integrating various processing units and accelerating communication in diverse, high-performance environments. The Baya Systems fabric supports multiple protocols, including AMBA, UCIe, UALink, and UltraEthernet.
Baya Systems has created a comprehensive fabric that supports popular protocols for scale-within, … More
Tenstorrent, an AI chipmaker considered an emerging challenger to Nvidia, recently released a white paper demonstrating how Baya’s fabric substantially boosts performance by up to 66% while reducing footprint by 50% compared to their home-grown state-of-the-art custom fabric. Tenstorrent is led by legendary computer architect Jim Keller, who is also a backer of Baya Systems, and sits on their board.
Beyond NoCs, Baya NeuraScale offers a scalable fabric solution based on the company’s WeaveIP technology, providing a non-blocking cross-bar replacement fabric that is designed to power switches for UALink or UltraEthernet standards in emerging scale-up and scale-out systems. The unique approach of using a “mesh”-based, tileable architecture that simplifies chiplet-based scaling, opens the path to much larger accelerator node counts compared to traditional crossbar switches, which are hitting reticle limits. This could enable 144-port or even 288-port racks, compared to today’s 72-port ones, substantially expanding scale.
Interestingly the company claims that the technology could enable much larger node counts beyond this once the industry adopts this. But what makes this additionally disruptive is that NeuraScale can substantially reduce the resources, time, and cost required to build these high-performance switches, thereby enabling smaller, nimble entrants to broaden and scale the market.
The WeaveIP NeuraScale Fabric
Fabrics Will Enable The Future Of AI
The modern data center is evolving rapidly, both in its compute elements (chiplets, chips, CPUs, GPUs) and in fabrics, to enable these systems to scale to hundreds of thousands of nodes and support AI.
While Nvidia’s new NVLink Fusion will allow non-Nvidia CPUs and GPUs to participate in the Nvidia rack-scale architecture, hardware vendors and hyperscalers will continue to seek an open fabric alternative to an ecosystem controlled by a single firm. Consequently, we envision a significant increase in these heterogeneous fabric technologies as AMD, Intel, and hyperscalers adopt them to build out their own AI Factories, both with and without Nvidia hardware. Fabrics like that of Baya Systems represent a key enabler in that evolution.
We have a more in-depth report on Baya Systems here.
And more information about Arteris can be found on their website.
Disclosures: This article expresses the opinions of the author and is not to be taken as advice to purchase from or invest in the companies mentioned. My firm, Cambrian-AI Research, is fortunate to have many semiconductor firms as our clients, including Baya Systems BrainChip, Cadence, Cerebras Systems, D-Matrix, Esperanto, Flex, Groq, IBM, Intel, Micron, NVIDIA, Qualcomm, Graphcore, SImA.ai, Synopsys, Tenstorrent, Ventana Microsystems, and scores of investors. I have no investment positions in any of the companies mentioned in this article. For more information, please visit our website at https://cambrian-AI.com.
Enhance your driving experience with the P12 Pro 4K Mirror Dash Cam Smart Driving Assistant, featuring Front and Rear Cameras, Voice Control, Night Vision, and Parking Monitoring. With a 4.3/5-star rating from 2,070 reviews and over 1,000 units sold in the past month, it’s a top-rated choice for drivers. The dash cam comes with a 32GB Memory Card included, making it ready to use out of the box. Available now for just $119.99, plus a $20 coupon at checkout. Don’t miss out on this smart driving essential from Amazon!
Help Power Techcratic’s Future – Scan To Support
If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.
As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!
BITCOIN bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge Scan the QR code with your crypto wallet app |
DOGECOIN D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA Scan the QR code with your crypto wallet app |
ETHEREUM 0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a Scan the QR code with your crypto wallet app |
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.