Details
- NVIDIA, Microsoft, and OpenAI have released Multipath Reliable Connection (MRC), an RDMA transport protocol, as an open specification through the Open Compute Project, proven first on NVIDIA Spectrum-X Ethernet hardware.
- Involves NVIDIA, OpenAI (Sachin Katti, head of industrial compute), Microsoft, and Oracle; deployed in OpenAI's Blackwell generation training, Microsoft's Fairwater, and Oracle's Abilene data centers.
- MRC distributes RDMA traffic across multiple paths for improved throughput, load balancing, and availability; features dynamic congestion avoidance, rapid retransmission, failure bypass in microseconds, and admin visibility.
- Builds on Spectrum-X Ethernet's multiplane designs with hardware-accelerated load balancing; contrasts with single-path RDMA by enabling gigascale AI fabrics up to hundreds of thousands of GPUs without performance loss.
- Spectrum-X supports Adaptive RDMA, MRC, and custom protocols natively on ConnectX SuperNICs and switches; used by leaders like OpenAI and Microsoft for frontier LLM training, emphasizing open standards for resilient AI infrastructure.
Impact
MRC's open release accelerates adoption of resilient, multi-path networking in AI factories, reducing GPU idle time and enabling efficient scaling to millions of GPUs amid surging demand for frontier models. It reinforces Spectrum-X Ethernet's lead over InfiniBand alternatives, fostering industry collaboration via OCP and steering R&D toward composable, hardware-accelerated fabrics. Over 12-24 months, expect broader hyperscaler deployments, boosting Ethernet's share in AI clusters and influencing funding toward open networking protocols.