USQCD Machine Performance
COMMENTS: The table above shows the measured performance of DWF, anisotropic clover, and asqtad inverters on the jpsi, Ds, Bc, pi0, 9q and 10q clusters, on the ANL BG/P, and the ORNL XT5. All performance numbers are single precision unless otherwise noted. Please note that the jpsi cluster is no longer available, but the data are included for reference. The DWF, Clover and asqtad performance figures for jpsi, Ds, Bc, pi0, 9q and 10q used 128-process (16-node, 4-node, 4-node, 8-node, 16-node,and 16-node respectively) runs, with 8, 16, or 32 processes per node, one process per core. DWF and Clover data were taken with Chroma. Clover runs used 6^{3}×64 local (per core) lattices, and DWF runs used 14×7×7×16 local (per core) lattices with L_{s} = 16. The runs for asqtad used 14^{4} local (per core) lattices. Clover and DWF performance measurements used the CG_INVERTER in Chroma. The DWF, Clover and asqtad performance figures for 12s are estimates taken from single node benchmarks and an assumed 0.9 scaling factor between single node (16 rank) and eight node (128 rank) runs. The BG/Q is based on average of DWF and HISQ performances. The XT5 clover performance figure is based on anisotropic clover calculations on 40^{3}×256 global volume run on 4K node runs. The final column of the table gives the Jpsi-equivalence for each of the USQCD resources. All except the Cray XT5 use the ratio of the average performance of asqtad and DWF; the XT5 uses the ratio of the average performance of the asqtad (HISQ) and clover inverters. |