3D Black Hole Simulation

The H code simulates the evolution of a black hole in three dimensions using a hyperbolic formulation of the Einstein equations by Joan Masso. The simulation is run for 100 time steps and uses minimal I/O operations.

SGI R8000 Power Challenge

no. processors     cpu time       speedup   problem size   ~ Mflops

      1             2067.1          1.00        64^3         59.0
      2             1057.1          1.96        64^3         115.3
      4             537.3           3.85        64^3         226.9
      8             282.1           7.33        64^3         432.1
     15		    175.2	    11.80	64^3	     695.8
     16	    ***** only 15 processros configured in due to a failed processor

no. processors     cpu time       speedup   problem size   ~ Mflops

      1             240.6           1.0         32^3	     63.4
      2             118.9           2.02        32^3	     128.3
      4             58.4            4.12        32^3	     261.0
      8             31.9            7.54        32^3	     477.8
     15             22.5            10.69       32^3	     677.7
     16     ***** only 15 processros configured in due to a failed processor

number of processors:

cpu type:

MIPS R8000

clock speed:

75 MHz

peak performance:

300 Mflops/4.8 Gflops

physical memory:

2 Gbytes

operating system:

IRIX 6.0

language:

F77 Version 6.0.0

compiler options:

-O3 -64 -mips4 -lfastm -mp -mp_schedtype=simple -pfa

key algorithms:

MacCormack finite difference scheme

additional software (libraries):

None

key contact:

Rob Gjertsen (gjertsen@ncsa.uiuc.edu)

remarks:

The experiments were actually run with problem sizes 34^3 and 66^3

instead of 32^3 and 64^3, so that maximum performance was achieved.

The times were scaled scaled accordingly to allow for a cross architectural

comparison of the code.

Both the Challenge and Power Challenge perform poorly with arrays of

size 2^n because the compiler does not automatically pad arrays of this

type, and array padding is necessary in this situation to avoid cache

thrashing.

The use of an explicit finite difference scheme enabled the compiler

to easily parallelize the code. Only minimal code changes were required.

The occurance of superlinear speed up for the 32^3 problem size can be

attributed to good cache utilization.