Infiniband Performance

In June, we remotely tested the performance of an Infiniband-connected cluster owned by TopSpin. The cluster had 16 dual 2.4 GHz Xeon nodes, based on Tyan 2722S2-533 motherboards built with E7501 chipsets (533 MHz FSB). Topspin two port 4x HCA's were used, connected to a TopSpin 360 Infiniband switch. The graphs below show comparisons with our current lattice QCD cluster, which was built using 2.4 GHz dual Xeons on SuperMicro P4DPE-Q motherboards with E7500 chipsets (400 MHz FSB), using Myrinet 2000 as an interconnect (M3F-PCI64B-2 interfaces, M3-E128 switch).

All of the graphs shown below are also links. Click on any graph to download an encapsulated PostScript version.

Pallas Sendrecv

The Pallas Sendrecv benchmark measures the aggregate bidirectional bandwidth obtained using MPI_Sendrecv calls. We ran this test between two nodes on each cluster.

We note that the measured Infiniband bandwidth is close to the 800 MByte/sec limit commonly observed on PCI-X buses for bidirectional traffic. Also, the new Myrinet M3F-PCIXD-2 interfaces are reported to sustain 489 MByte/sec summed bidirectional bandwidth, limited by the 2.50+2.50 Myrinet hardware link layer.


We used the MPI version of the Netpipe benchmark from Ames Lab to measure the one-way bandwidth and latency of ping-ponged MPI send calls. We ran this test between two nodes on each cluster.

On the bandwidth plot, we note again that the Infiniband result is close to the PCI-X limit, and that newer Myrinet interfaces achieve very close to 250 MByte/sec.

Shown below is a Netpipe network signature graph, in which bandwidth is plotted against transfer time for many different message sizes. The intercept on the abscissa gives the zero-length message latency, and the horizontal asymptote gives the saturation bandwidth.

We note from this graph that the latency for MPI messaging on Infiniband using MVAPICH is 7 microseconds, and the corresponding Myrinet latency is 11 microseconds. Newer Myrinet interfaces are reported to have improved latencies.

MILC Scaling

Shown below are scaling curves for the MILC improved staggered code. On the two clusters, we ran a constant lattice size per node calculation at various lattice sizes and on varying numbers of nodes. Each curve shows performance for a given lattice size L^4, where L was 4, 6, 8, 10, 12, or 14. Each lattice size was run on these combinations of nodes: single, 2, 4, 8, and 16 nodes. Use a wide browser window to see these graphs side by side. Again, click on the graphs to download encapsulated postscript versions.

The increased performance on the Infiniband cluster is due to both the increased memory bandwidth available with the E7501 chipset and the 533 MHz front side bus processors, and the higher bandwidth and lower latency of Infiniband compared with Myrinet.

Don Holmgren
Last Modified: 8th Sep 2003