Infiniband Performance on PCI Express
We have recently compared the performance of Infiniband PCI-X and PCI-E
host channel adapters with the Netpipe and Pallas MPI benchmarks. Our
hardware in detail:
- Computers:
- PCI-X nodes are based on SuperMicro P4DEP motherboards (Intel
E7500 chipset, DDR400 memory, 64/133 PCI-X slots, 2.0 GHz Xeon processors)
- PCI-E nodes are based on Abit AA8 motherboards (Intel 925X
chipset, DDR2-533 memory, x16 PCI Express slot, 3.2 GHz P4E
processor)
- Switch:
- HCAs:
- Topspin PCI-X HCA
- Mellanox Infinihost III Ex PCI-E HCA
All of the graphs shown below are also links. Click on any graph to download
a PostScript version.
Netpipe
We used the VAPI, IPoIB (TCP/IP over Infiniband), and MPI versions of the Netpipe benchmark from Ames Lab
to measure the one-way bandwidth and latency of ping-ponged sends.
We ran this test between two nodes of each type. Version 3.62 of Netpipe was used for all tests.
The first plot below compares bandwidth (MB/sec) as a function of message size for the
MPI versions running on the 925X and E7500 chipsets. On the 925X systems, only
the OSU MPI ("mvapich") over the Mellanox "HPC Gold" IB driver was used. On
the E7500 sytems, OSU MPI over both the "HPC Gold" and supplied Topspin driver
(v2.0.0, build 531) was used. The zero-length message latencies
for MPI were as follows:
- PCI Express: 4.5 microsec (data file, command: mpirun_rsh -np 2 mqcd0301 mqcd0302 NPmpi)
- PCI-X, "HPC Gold": 7.4 microsec (data file, command: mpirun_rsh -np 2 mqcd0301 mqcd0302 NPmpi)
- PCI-X, Topspin v2.0.0_531: 7.3 microsec (data file, command: mpirun_rsh -np 2 mqcd0303 mqcd0304 NPmpi
The next plot compares Netpipe performance on the 925X chipset (PCI Express)
systems using rdma_write, OSU MPI, and IPoIB (TCP/IP).
The latencies were:
- rmda_write: 4.3 microsec (data file, command: NPib -t rdma_write)
- MPI: 4.5 microsec (data file, command: mpirun_rsh -np 2 mqcd0301 mqcd0302 NPmpi)
- IPoIB: 27.6 microsec (data file, command: NPtcp)
The next plots compare Netpipe performance on the E7500 chipset (PCI-X)
systems using rdma_write, NCSA MPI, OSU MPI, and
IPoIB. Except for the OSU MPI data, which used the TopSpin
drivers, the data all were taken with the OpenIB drivers.
Pallas Sendrecv
The Pallas Sendrecv
benchmark measures a variety of standard MPI calls. The following output
files are available:
MILC Application
Eager-Rendezvous
The following plots show Netpipe performance as the eager-rendezvous limit is
varied on an NCSA MPI version of the code with the "-eagerlen" switch.
Results for the E7500 (PCI-X) and 925X (PCI-E) chipsets are shown.
Don Holmgren
Last Modified: 3rd Aug 2004