|
USQCD Machine Performance
The table above shows the measured performance of DWF, anisotropic clover, and asqtad inverters on the qcd, pion, kaon, 6N, and 4G clusters, and on the ANL BG/P and the ORNL XT4. For qcd and pion, the asqtad numbers were taken on 64-node runs, 14^4 local lattice per node, and the DWF numbers were taken on 64-node runs using Ls=16, averaging the performance of 32x8x8x8 and 32x8x8x12 local lattice runs together. The DWF, Clover and asqtad performance figures for kaon, 6N, and 7N use 128-process (32-node, 64-node, and 16-node respectively) runs, with 4, 2, or 8 processes per node, one process per core. Clover performance on 7N used 128 processes with 4^3x8 local volumes per process. The DWF and Clover performance runs for 4G used single panels (128 node jobs, 1 core/node) with mesh layouts of 1x4x4x8. The BG/P and XT4 DWF performance measurements used local volumes of 4^4 (Ls=16) and 6x6x6x4 per core, respectively. The BG/P asqtad result is the average of the performance of 6^4 and 8^4 local volumes, and is single precision. The BG/P DWF result is double precision. |