Here, performance as functions of local lattice volume and number of nodes is shown. The data points on each line represent the performance of the MILC asqtad code in constant volume per node runs. All data were measured on the 2.4 GHz dual Xeon cluster at Fermilab, which uses a Myrinet 2000 fabric. All measurements shown here give the performance for a single process per node.
A concrete comparison from these data: running with a local volume of 6^4 per node, rather than 14^4, would require 29.6 times as many nodes for the same total volume, with each node sustaining 52% of the performance capable with 14^4 local volumes (using the 6^4 128-node point, and the 14^4 32-node point). A given job executed at 6^4 per node would finish in 1/15th the time as the same total size job executed at 14^4 per node, but use ~ 30 times as many nodes.