Fermilab Lattice QCD Computing Hardware


Modern computing hardware used in lattice gauge theory calculations (such as in Fermilab's 127-node Myrinet cluster shown at right) has a price/performance that is rapidly approaching $1/Megaflop. This can be compared with approximately $1,000,000/MF on the VAX 11/780s on which the first numerical lattice calculations were done 20 years ago, and $100/MF for Fermilab's ACPMAPS computer in the early 1990s.




Our current production clusters: a 127-node cluster (qcd) with single 2.8 GHz Pentium 4 processors and a Myrinet fabric, a 518-node cluster (pion) with single 3.2 GHz Pentium 640 processors and an Infiniband fabric, and a 600-node cluster (kaon) with dual dual-core Opteron 270 (2.0 GHz) processors and a double-data-rate Infiniband fabric. The Pentium processors on the qcd and pion clusters have an 800 MHz front side bus. qcd uses DDR memory, and pion DDR2 memory. Pictured on the left is one of the qcd nodes. The Opteron processors on the kaon cluster also have 800 MHz front side buses, and use DDR memory.

The pion cluster uses PCI Express Infiniband network interface cards in each node and sixteen 24-port leaf and one 144-port spine Infiniband switches. All nodes connect to the leaves and the leaves with 4:1 oversubscription (4 uplinks per leaf) connect to the spine. Each pion node achieves a peak unidirectional bandwidth of 710 MB/sec and bidirectional bandwidth of 1320 MB/sec. Pictured on the right is the 144-port Infiniband spine switch.



ClusterProcessorNodesDWF Performanceasqtad Performance
qcd2.8 GHz P4E1271400 MFlops/node1017 MFlops/node
pion3.2 GHz Pentium 6405181729 MFlops/node1594 MFlops/node
kaon2.0 GHz Dual Opteron6004703 MFlops/node3832 MFlops/node

The table above shows the measured performance of DWF and asqtad inverters on the qcd, pion, and kaon clusters. For qcd and pion, the asqtad numbers were taken on 64-node runs, 14^4 local lattice per node, and the DWF numbers were taken on 64-node runs using Ls=16, averaging the performance of 32x8x8x8 and 32x8x8x12 local lattice runs together. The DWF and asqtad performance figures for kaon use 128-process (32-node) runs, with 4 processes per node, one process per core.

Fermilab's New Muon Lab houses the lattice gauge theory computational facility.