For comparison, the plot below shows the performance of these codes
on the 2.8 GHz Pentium 4 system.
At Fermilab, the production MILC codes used all have additional SSE2
optimizations (see "Inline SSE MILC Math
Routines"). The plot below shows the performance of the G5 and Pentium 4
on the code with the "Field Major" optimization, as well as the performance of
code with SSE2 and "Field Major" optimizations on the Pentium 4 system.
gcc -O3", gcc 2.95.3]
------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 8000000, Offset = 0 Total memory required = 183.1 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity appears to be less than one microsecond. Each test below will take on the order of 38768 microseconds. (= -2147483648 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 2848.7459 0.0450 0.0449 0.0457 Scale: 2851.4134 0.0453 0.0449 0.0457 Add: 3470.4630 0.0556 0.0553 0.0557 Triad: 3456.0981 0.0557 0.0556 0.0561
cc -O5", IBM VAC 6.0 Beta]
bash-2.05a$ ./stream_d ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 8000000, Offset = 0 Total memory required = 183.1 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 46828 microseconds. (= 46828 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 2923.5762 0.0439 0.0438 0.0441 Scale: 2566.9303 0.0500 0.0499 0.0502 Add: 2304.2857 0.0834 0.0833 0.0836 Triad: 2339.1539 0.0822 0.0821 0.0823
f77 -O5", IBM XLF
8.1 Beta]
bash-2.05a$ mpirun -np 1 stream_mpi ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Number of processors = 1 Array size = 4000000 Offset = 0 The total memory requirement is 91.6 MB ( 91.6MB/task) You are running each test 10 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 2910.9423 .0220 .0220 .0220 Scale: 2577.4299 .0249 .0248 .0249 Add: 2385.3296 .0403 .0402 .0404 Triad: 2456.4980 .0391 .0391 .0392 ----------------------------------------------- Solution Validates! ----------------------------------------------- bash-2.05a$ mpirun -np 2 stream_mpi ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Number of processors = 2 Array size = 4000000 Offset = 0 The total memory requirement is 183.1 MB ( 91.6MB/task) You are running each test 10 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 2780.1300 .0465 .0460 .0469 Scale: 2779.6478 .0464 .0460 .0469 Add: 2649.7750 .0728 .0725 .0736 Triad: 2777.8189 .0698 .0691 .0709 ----------------------------------------------- Solution Validates! -----------------------------------------------
Current: 2.52 A (standard 120 VAC service) Power: 293 W KVA: 299 VA Power Factor: 0.97
Current: 1.35 A Power: 119 W KVA: 159 VA Power Factor: 0.74