Supercomputing'98
AEI
- ANL
- NCSA
- NLANR
- RZG
- SDSC
- WashU
- ZIB
Colliding Black Holes and Neutron Stars Across the Atlantic Ocean.
with support by
Berkom -
Canarie -
STARTAP -
Teleglobe -
vBNS
|
Using tightly coupled supercomputers in Europe and America, we propose to
perform an intercontinental, distributed simulation of the full 3D Einstein
equations of general relativity, calculating the collision of black
holes and neutron stars. The simulation itself will be distributed
across machines on both continents, utilizing Globus, and will be
controlled and displayed live on a Immersadesk Virtual Reality system
at SC'98 in Orlando, FL.
|
|
Detailed Project Description
The simulations will be performed by a new parallel computer code, called
"Cactus", developed to solve the
complete set of 3D Einstein
equations, to study problems such as colliding black holes, neutron
stars, the formation of singularities, and other aspects of Einstein's
theory that cannot be handled by analytic means. This code for the
full scale simulation of space and time in the Einstein theory was
initiated at the
Max-Planck-Institut-fuer-Gravitationsphysik
(Albert-Einstein-Institut) in Potsdam, Germany, and is developed by
the NCSA/Potsdam/Washington U
collaboration in numerical relativity,
together with colleagues in other institutions around the world.
In Einstein's theory of general relativity, gravity is governed by an
extremely complex set of coupled, nonlinear, hyperbolic-elliptic
partial differential equations. The largest parallel supercomputers
are finally approaching the speed and memory required to solve the
complete set of Einstein's equations for the first time since they
were written over 80 years ago, allowing one to attempt full 3D
simulations of such exciting events as colliding black holes and
neutron stars. Such events are expected to be observed within the
next few years by the Laser Interferomenter Gravitational Wave
Observatory (LIGO) presently under construction in the US, and by its
European counterparts VIRGO and GEO600. In this demonstration, we show how
emerging supercomputing technology can be harnessed in novel ways to
enable calculations that were simply unthinkable a few years ago, to
push on new frontiers in physics and astronomy.
Two remote SGI-Cray parallel supercomputers will work together to
perform this simulation, one of the largest ever attempted in the
study of Einstein's equations. The domain decomposition will involve
one compact object (either a neutron star of black hole) in Europe,
and one in America. Fully utilizing an OC-3 transatlantic ATM
network, the two objects will collide and merge (in a virtual space
"somewhere" over the Atlantic Ocean). Rather than having two
disconnected simulations at the remote sites, this will entail a fully
coupled calculation that will treat the two supercomputers as a single
computational system, pushing the limits of achievable bandwidth on
such a transatlantic network. A third computer at SC'98 will be used
to control and display the simulation.
The communications will be handled by MPICH-G, a new
Globus-enabled
implementation of the Message Passing Interface. MPICH-G incorporates
a number of features designed to support efficient execution in wide
area environments, including dynamic selection of communication
methods and topology information that allows optimized implementations
of collective operations. The Argonne and NCSA groups will work
during this project to optimize MPICH-G performance for the
transatlantic environment, for example studying the utility of
specialized techniques for overlapping very long latencies.
Integrated Paradyn instrumentation will be used to guide this
optimization process. Optimizations will be integrated back into the
Globus and Cactus systems so that users will be able to take advantage
of this work for their own simulations using distributed computers,
making a long term impact.
The simulation will be launched from the show floor in Orlando, using
a newly developed control interface displayed on an ImmersaDesk
Virtual Reality system, developed at EVL. Once the simulation is
launched, it will be visualized as it is computed, showing in full 3D
various isosurfaces of the evolved functions indicating the merger and
emission of gravitational waves.
This technology prototypes an emerging world of metacomputing,
bringing together extraordinary computing resources, wherever they
may be located in the world, to attack extraordinary scientific
problems that cannot be achieved by other means.
Ed Seidel
Project people
- Werner Benger, Max-Planck-Institut-fuer-Gravitationsphysik, Konrad-Zuse-Institut;
- Bernd Bruegmann, Max-Planck-Institut-fuer Gravitationsphysik;
- Ian Foster and Olle Larsson, Argonne National Laboratory;
- Joan Masso, University of the Balearic Islands;
- Mark Miller, Washington University;
- Jason Novotny, NCSA/NLANR;
- Marcus Pattloch, DFN-Verein;
- Edward Seidel and John Shalf, NCSA;
- Warren Smith, Argonne National Lab;
- Wai-Mo Suen and Malcolm Tobias, Washington University
The Final Countdown - or - Our brief history of time
Excerpts from the email traffic around our SC98 preparations
Mon, 26 Oct 1998 Network becomes ready
It looks like we are finally getting the ATM network
for your project ready.
We are working on the link ZIB-Berkom-Canada-... and we
may have it operational on Wednesday or so.
Tue, 27 Oct 1998 Latest test results
This evening we tried to have a further test between ZIB and RZG.
Unfortunely pc had some unknown problems (login hang, globus jobs
hang) and we couldn't continue our testing.
However not all of the time was lost and I could make some test runs
with 128 PE's on berte to get some experience concerning the
visualization aspect.
Wed, 28 Oct 1998 IVP, black holes and Globus
Although there was no reserved test time today, I could get some 2x16
globus jobs between berte and pc to run. As expected, using IVP increases the
Init time enormous, by a factor of 100 as compared to do_ivp="no".
Wed, 28 Oct more data plotted
We were able to get a medium sized run on a total of 32 PE's with
the skip-poll value set to 1000 as Warren had recommended in the MOO.
However, our communication time INCREASED when compared to the same run
with the default skip-poll value. Unfortunately, pc crashed while I was
trying a run with a skip-poll of 100...
Fr, 30.Oct.1998 Test time today (first transatlantic globus runs)
I just got the message that for today the Telekom line is booked
from 12h-18h CET.
This means, if the T3E tests start today at 17h MET (5pm), we have
only one hour for testing!!
1.Nov.1998 update?
We've narrowed the problem we are having at SDSC...
I hope to have this fixed by our testing time tuesday, but
since it takes 8 hours or so to install globus on a t3e...
3.Nov.1998 still trying to test between sdsc and berte
Again, we had no luck submitting jobs to berte from san diego or
vice versa. Warren is working hard on this issue. In the
meantime, there's a working globus that can submit jobs to both machines
from Warren's Solaris box, so we may use that for testing again
tomorrow. I don't know what more can be done in the meantime since we
really need to try out distributed jobs, in order to guess a good
parameter set for the demo.
4.Nov.1998 demo this morning?
Today wednesday the transatlantic line was not available due to an
unknown problem at Berkom, so we had to cancel the whole test.
It will be repeated tomorrow thursday.
5.Nov.1998 Globus is Alive!
After the bad luck yesterday, where nothing appeared to work, now
everything works ok. The transatlantic line is up, globus works and
we could submit globus jobs from berte to berte & golden.
6.Nov.1998 Viz. test successfull
...just running berte <--> golden with 8+8 neutron star collision
and thorn_IsoSurfacer, displaying two nested transparent isosurfaces
locally in Amira. By error, on the first trial the isosurfacer data were
sent from golden via the normal internet line to my desktop, but it
worked and made 30seconds/iteration (ca., high diffusion in this time).
When this error was corrected and the isosurfaces were sent locally
from berte, it came up to 15seconds/iteration.
9.Nov.1998 times for demos/resources
We desperately need more test time tomorrow morning.
We have almost managed to get the application between RZG-ZIB, and SDSC working,
but not quite. Can everyone give us 128 processors?
10.Nov.1998 times for demos/resources
... we have severe problems using the visualization capabilities when
running on 128+128 nodes... it's hard to debug, since one needs two
T3E's and at least 64 interactive nodes on each.
11.Nov.1998
wild debug sessions today including nearly all resources of
ZIB and RZG.
11.Nov.1998, 1pm EST Success!
We had a tough morning today, when for 4 hours all our plans and
even all very solid backup plans broke for many reasons. But we calmly
perservered, and with just four minutes before the judges showed up, we got
our 2 T3E transatlantic SDSC-ZIB neutron star collision to work again! The
3D VR display showed the two stars coming together as the two machines
worked together to compute their collision. The presentation went very
well, and we finished up showing the code also running on the NT
supercluster at NCSA. We all feel it was a great success, and are excited
that most of this can also be used in our production science work, either
right away, or in the near future. Thanks to everyone for so much hard
work from so many people (an incredible number of people who made this
possible!). We are planning to continue this research and migrate
everything back to production work as we can.
12.Nov.1998
Recovering from the SC98 battle...
Link section