Resource Guide Networking

InfiniBand Networking: High-Performance Interconnects for HPC

Guide to InfiniBand networking covering generations, RDMA technology, comparison with Ethernet, storage applications, and the current market ecosystem.

What is InfiniBand?

InfiniBand is a high-bandwidth, low-latency networking technology widely used in high-performance computing, data centers, and enterprise storage environments. Developed by the InfiniBand Trade Association (IBTA), it provides data rates from 10 Gb/s (SDR) to 400 Gb/s (NDR) per port, with sub-microsecond latency and native support for Remote Direct Memory Access (RDMA).

InfiniBand Generations

InfiniBand speeds have progressed through several generations: SDR (10 Gb/s), DDR (20 Gb/s), QDR (40 Gb/s), FDR (56 Gb/s), EDR (100 Gb/s), HDR (200 Gb/s), and the current NDR (400 Gb/s). The upcoming XDR generation will deliver 800 Gb/s per port. Multi-port configurations (4x) are common, with HDR providing 200 Gb/s through four 50 Gb/s lanes.

RDMA and Its Importance

RDMA allows one computer to directly access the memory of another without involving either operating system, dramatically reducing latency and CPU overhead. InfiniBand provides native RDMA support, while Ethernet requires additional protocols like RoCE (RDMA over Converged Ethernet) or iWARP to achieve similar functionality. Native RDMA is critical for MPI-based parallel applications and high-performance storage protocols like NVMe-oF.

InfiniBand vs High-Speed Ethernet

While 100/200/400 GbE has narrowed the bandwidth gap with InfiniBand, key differences remain. InfiniBand offers lower latency (0.5 microseconds vs 2-5 microseconds for Ethernet), better RDMA implementation, credit-based flow control that prevents packet drops, and a mature software ecosystem (OFED) for HPC. Ethernet advantages include ubiquity, lower switch costs, familiarity, and better integration with cloud-native and containerized environments.

Storage over InfiniBand

InfiniBand is the preferred interconnect for high-performance storage: Lustre, GPFS, and BeeGFS all support native InfiniBand transport. NVMe over Fabrics (NVMe-oF) with InfiniBand provides near-local-disk latency for remote NVMe storage access. Storage clusters connected via InfiniBand can deliver aggregate throughputs exceeding multiple TB/s, essential for HPC scratch file systems and AI training data pipelines.

Market and Ecosystem

NVIDIA (following the Mellanox acquisition) dominates the InfiniBand market with ConnectX network adapters and Quantum switches. The OFED (OpenFabrics Enterprise Distribution) software stack provides drivers and libraries. Intel briefly competed with Omni-Path Architecture (OPA) but has shifted focus to Ethernet. InfiniBand remains essential for TOP500 supercomputers and large-scale AI clusters, including those powering GPT and similar large language models.

Daniel Kovacs
Written by
Daniel Kovacs