H3expresso Scalability

The following table gives the time (in seconds) that it takes H3expresso to evolve 100 iterations on different architectures. A fixed grid of 64x64x64 was used so it was possible to run the code on all systems. A bigger size (e.g. 128x128x128) would give much better performance when using a high number of proccesors.

Processors IBM SP2 CRAY T3D TMC CM5 CRAY C90 SGI PC Exemplar

1 2072 312 2067 2880

2 1344 160 1057 1500

4 760 86 537 800

8 376 42 282 420

16 209 494 24 164

32 103 240 151

64 54 117 79

128 28 57 42

256 19 41 24

512 15

Processors	IBM SP2	CRAY T3D	TMC CM5	CRAY C90	SGI PC	Exemplar
1	2072			312	2067	2880
2	1344			160	1057	1500
4	760			86	537	800
8	376			42	282	420
16	209	494		24	164
32	103	240	151
64	54	117	79
128	28	57	42
256	19	41	24
512			15

The following graph plots the table in logarithmic coordinates to show the scalability. The implementation (message passing with MPI or some vendor specific data parallel fortran) is given in the lengend. Note the effect of the small grid size when using a high number of proccesors (256 or 512). The lines keep their slope when using bigger grid sizes. The "Projected" line shows the dream of the future: a teraflop machine! (e.g., a 512 node C90)