When AI data centres run out of space, they face a costly dilemma: build bigger facilities or find ways to make multiple locations work together seamlessly. NVIDIAâs latest Spectrum-XGS Ethernet technology promises to solve this challenge by connecting AI data centres across vast distances into what the company calls âgiga-scale AI super-factories.âÂ
Announced ahead of Hot Chips 2025, this networking innovation represents the companyâs answer to a growing problem thatâs forcing the AI industry to rethink how computational power gets distributed.
The problem: When one building isnât enough
As artificial intelligence models become more sophisticated and demanding, they require enormous computational power that often exceeds what any single facility can provide. Traditional AI data centres face constraints in power capacity, physical space, and cooling capabilities.Â
When companies need more processing power, they typically have to build entirely new facilitiesâbut coordinating work between separate locations has been problematic due to networking limitations. The issue lies in standard Ethernet infrastructure, which suffers from high latency, unpredictable performance fluctuations (called âjitterâ), and inconsistent data transfer speeds when connecting distant locations.Â
These problems make it difficult for AI systems to efficiently distribute complex calculations across multiple sites.
NVIDIAâs solution: Scale-across technology
Spectrum-XGS Ethernet introduces what NVIDIA terms âscale-acrossâ capabilityâa third approach to AI computing that complements existing âscale-upâ (making individual processors more powerful) and âscale-outâ (adding more processors within the same location) strategies.
The technology integrates into NVIDIAâs existing Spectrum-X Ethernet platform and includes several key innovations:
- Distance-adaptive algorithms that automatically adjust network behaviour based on the physical distance between facilities
- Advanced congestion control that prevents data bottlenecks during long-distance transmission
- Precision latency management to ensure predictable response times
- End-to-end telemetry for real-time network monitoring and optimisation
According to NVIDIAâs announcement, these improvements can ânearly double the performance of the NVIDIA Collective Communications Library,â which handles communication between multiple graphics processing units (GPUs) and computing nodes.
Real-world implementation
CoreWeave, a cloud infrastructure company specialising in GPU-accelerated computing, plans to be among the first adopters of Spectrum-XGS Ethernet.Â
âWith NVIDIA Spectrum-XGS, we can connect our data centres into a single, unified supercomputer, giving our customers access to giga-scale AI that will accelerate breakthroughs across every industry,â said Peter Salanki, CoreWeaveâs cofounder and chief technology officer.
This deployment will serve as a practical test case for whether the technology can deliver on its promises in real-world conditions.
Industry context and implications
The announcement follows a series of networking-focused releases from NVIDIA, including the original Spectrum-X platform and Quantum-X silicon photonics switches. This pattern suggests the company recognises networking infrastructure as a critical bottleneck in AI development.
âThe AI industrial revolution is here, and giant-scale AI factories are the essential infrastructure,â said Jensen Huang, NVIDIAâs founder and CEO, in the press release. While Huangâs characterisation reflects NVIDIAâs marketing perspective, the underlying challenge he describesâthe need for more computational capacityâis acknowledged across the AI industry.
The technology could potentially impact how AI data centres are planned and operated. Instead of building massive single facilities that strain local power grids and real estate markets, companies might distribute their infrastructure across multiple smaller locations while maintaining performance levels.
Technical considerations and limitations
However, several factors could influence Spectrum-XGS Ethernetâs practical effectiveness. Network performance across long distances remains subject to physical limitations, including the speed of light and the quality of the underlying internet infrastructure between locations. The technologyâs success will largely depend on how well it can work within these constraints.
Additionally, the complexity of managing distributed AI data centres extends beyond networking to include data synchronisation, fault tolerance, and regulatory compliance across different jurisdictionsâchallenges that networking improvements alone cannot solve.
Availability and market impact
NVIDIA states that Spectrum-XGS Ethernet is âavailable nowâ as part of the Spectrum-X platform, though pricing and specific deployment timelines havenât been disclosed. The technologyâs adoption rate will likely depend on cost-effectiveness compared to alternative approaches, such as building larger single-site facilities or using existing networking solutions.
The bottom line for consumers and businesses is this: if NVIDIAâs technology works as promised, we could see faster AI services, more powerful applications, and potentially lower costs as companies gain efficiency through distributed computing. However, if the technology fails to deliver in real-world conditions, AI companies will continue facing the expensive choice between building ever-larger single facilities or accepting performance compromises.
CoreWeaveâs upcoming deployment will serve as the first major test of whether connecting AI data centres across distances can truly work at scale. The results will likely determine whether other companies follow suit or stick with traditional approaches. For now, NVIDIA has presented an ambitious visionâbut the AI industry is still waiting to see if the reality matches the promise.
See also: New Nvidia Blackwell chip for China may outpace H20 model
Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.
Read the full article here