MILC Scaling on Clusters

The cluster data points shown on the price/performance trends graph all represent performance with large local volumes (14^4). Performance on smaller local volumes degrades, as the surface to volume ratio and relative frequency of collective operations increase. This effect is shown in the graph below.


Here, performance as functions of local lattice volume and number of nodes is shown. The data points on each line represent the performance of the MILC asqtad code in constant volume per node runs. All data were measured on the 2.4 GHz dual Xeon cluster at Fermilab, which uses a Myrinet 2000 fabric. All measurements shown here give the performance for a single process per node.

A concrete comparison from these data: running with a local volume of 6^4 per node, rather than 14^4, would require 29.6 times as many nodes for the same total volume, with each node sustaining 52% of the performance capable with 14^4 local volumes (using the 6^4 128-node point, and the 14^4 32-node point). A given job executed at 6^4 per node would finish in 1/15th the time as the same total size job executed at 14^4 per node, but use ~ 30 times as many nodes.