3D Black Hole Simulation

The H code simulates the evolution of a black hole in three dimensions using a hyperbolic formulation of the Einstein equations by Joan Masso. The simulation is run for 100 time steps and uses minimal I/O operations.

SGI R8000 Power Challenge

no. processors     cpu time       speedup   problem size   ~ Mflops

      1             2067.1          1.00        64^3         59.0
      2             1057.1          1.96        64^3         115.3
      4             537.3           3.85        64^3         226.9
      8             282.1           7.33        64^3         432.1
     15		    175.2	    11.80	64^3	     695.8
     16	    ***** only 15 processros configured in due to a failed processor

no. processors     cpu time       speedup   problem size   ~ Mflops

      1             240.6           1.0         32^3	     63.4
      2             118.9           2.02        32^3	     128.3
      4             58.4            4.12        32^3	     261.0
      8             31.9            7.54        32^3	     477.8
     15             22.5            10.69       32^3	     677.7
     16     ***** only 15 processros configured in due to a failed processor

number of processors:
16
cpu type:
MIPS R8000
clock speed:
75 MHz
peak performance:
300 Mflops/4.8 Gflops
physical memory:
2 Gbytes
operating system:
IRIX 6.0
language:
F77 Version 6.0.0
compiler options:
-O3 -64 -mips4 -lfastm -mp -mp_schedtype=simple -pfa
key algorithms:
MacCormack finite difference scheme
additional software (libraries):
None
key contact:
Rob Gjertsen (gjertsen@ncsa.uiuc.edu)
remarks:
The experiments were actually run with problem sizes 34^3 and 66^3
instead of 32^3 and 64^3, so that maximum performance was achieved.
The times were scaled scaled accordingly to allow for a cross architectural
comparison of the code.
Both the Challenge and Power Challenge perform poorly with arrays of
size 2^n because the compiler does not automatically pad arrays of this
type, and array padding is necessary in this situation to avoid cache
thrashing.
The use of an explicit finite difference scheme enabled the compiler
to easily parallelize the code. Only minimal code changes were required.
The occurance of superlinear speed up for the 32^3 problem size can be
attributed to good cache utilization.