The following table gives the time (in seconds) that it takes H3expresso to evolve 100 iterations on different architectures. A fixed grid of 64x64x64 was used so it was possible to run the code on all systems. A bigger size (e.g. 128x128x128) would give much better performance when using a high number of proccesors.
| Processors | IBM SP2 | CRAY T3D | TMC CM5 | CRAY C90 | SGI PC | Exemplar |
|---|---|---|---|---|---|---|
| 1 | 2072 | 312 | 2067 | 2880 | ||
| 2 | 1344 | 160 | 1057 | 1500 | ||
| 4 | 760 | 86 | 537 | 800 | ||
| 8 | 376 | 42 | 282 | 420 | ||
| 16 | 209 | 494 | 24 | 164 | ||
| 32 | 103 | 240 | 151 | |||
| 64 | 54 | 117 | 79 | |||
| 128 | 28 | 57 | 42 | |||
| 256 | 19 | 41 | 24 | |||
| 512 | 15 |
The following graph plots the table in logarithmic coordinates to show the scalability. The implementation (message passing with MPI or some vendor specific data parallel fortran) is given in the lengend. Note the effect of the small grid size when using a high number of proccesors (256 or 512). The lines keep their slope when using bigger grid sizes. The "Projected" line shows the dream of the future: a teraflop machine! (e.g., a 512 node C90)