Nvidia expands AI capabilities with giga-scale networking and faster inference serving

Kyt Dotson
2025-08-22 11:00:00
siliconangle.com

Nvidia Corp. today announced advances in artificial intelligence software and networking innovations aimed at accelerating AI infrastructure and model deployment.

The technology giant, which makes the graphics processing units that power much of the AI economy, unveiled Spectrum-XGS, or “giga-scale,” for its Spectrum-X Ethernet switching platform designed for AI workloads. Spectrum-X connects entire clusters within the data center, allowing massive datasets to stream across AI models. Spectrum-XGS extends this by providing orchestration and interconnection between data centers.

“So, you’ve heard us use terms like scale up and scale out. Now we’re introducing this new term, ‘scale across,’” said Dave Salvator, director of accelerated computing products at Nvidia. “These switches are basically purpose built to enable multi-site scale with different data centers able to communicate with each other and essentially act as one gigantic GPU.”

In terms of how this helps data centers, “scale up” means bigger machines and “scale out” refers to more machines in the data center. However, many data centers have a limited amount of power they can draw or the amount of heat they can dissipate before efficiency drops. This caps the number of machines or the amount of compute that can feasibly be packed into a particular location.

Salvator said the system minimizes jitter and latency, the variability in packet arrival times and the delay between sending data and receiving a response. Both are critical in AI networking because they determine how much bandwidth can be achieved between GPUs spread across sites.

Comparatively, NVLink Fusion, a network fabric technology Nvidia unveiled in May, allows cloud providers to scale up their data centers to handle millions of GPUs at a time. Together, NVLink Fusion and Spectrum-XGS represent two layers of scaling AI infrastructure: one inside the data center, and one across multiple data centers.

Researching better methods to serve AI models

Dynamo is Nvidia’s inference serving framework, which is how models are deployed and process knowledge.

Nvidia has been researching how to deploy models using a specialized technique called disaggregated serving using this platform. This splits “prefill,” or context building, and “decode,” or token generation, across different GPUs or servers.

This is important because inference, at one time considered secondary to model training is now becoming a serious challenge during the agentic AI era, where reasoning models generate tremendous amounts of tokens than older models. Dynamo is Nvidia’s answer to this by creating a faster, more efficient and cost-efficient way of handling this.

“If you look at both interactivity on a model like GPT OSS, OpenAI’s most recent community model they just released, we’re able to achieve, about a 4X increase in tokens per second,” said Salvator. “You look at DeepSeek, we’re also able to achieve really significant bumps there in terms of a 2.5X increase.”

Nvidia is also researching “speculative decoding,” which uses a second, smaller model to guess what the main model will output for a given prompt in an attempt to speed it up. “The way that this works is you have what’s called a draft model, which is a smaller model which attempts to sort of essentially generate potential next tokens,” said Salvator.

Because the smaller model is faster but less accurate, it can generate multiple guesses for the main model to verify.

“The ability here is that the more that that draft model can speculatively correctly guess what those next tokens need to be, the more performance you can pick up,” explained Salvator. “And we’ve already seen about a 35% performance gain using these techniques.”

According to Salvator, the main AI model does verification in parallel against its learned probability distribution. Only accepted tokens are committed, so rejected tokens are discarded. This keeps latency under 200 milliseconds, which he described as “snappy and interactive.”

Image: SiliconANGLE/Microsoft Designer

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Source Link

Enjoy the perfect blend of retro charm and modern convenience with the Udreamer Vinyl Record Player. With 9,041 ratings, a 4.3/5-star average, and 400+ units sold in the past month, this player is a fan favorite, available now for just $39.99.

The record player features built-in stereo speakers that deliver retro-style sound while also offering modern functionality. Pair it with your phone via Bluetooth to wirelessly listen to your favorite tracks. Udreamer also provides 24-hour one-on-one service for customer support, ensuring your satisfaction.

Don’t miss out—get yours today for only $39.99 at Amazon!

Unlock unlimited streaming with a free Amazon Prime trial!
Sign up today!

Help Power Techcratic’s Future – Scan To Support

If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.

As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!

BITCOIN

Bitcoin Logo