Distributed Simulations
History
Cactus, and it's predecessor codes, has been using and
helping to develop Grid infrastructure software for several
years:
1995:
Large scale distributed computing across
the vBNS was demonstrated at SC95, using a direct
ancestor of Cactus, running a black hole simulation.
This was one
of the experiments leading to the development of
the grid infrastructure Globus.
http://www.tc.cornell.edu/er96/ff03summer/ff10spacetime.html
http://jean-luc.aei.mpg.de/Projects/SC95/
1997:
http://jean-luc.aei.mpg.de/Projects/SC97/
1998:
Cactus 3 was used to perform a simulation
of colliding neutron stars across the two continents,
distributing the computational grid across three T3E's
in Garching (Germany), Berlin (Germany) and SDSC (USA).
This application won the Most Stellar HPC
Challenge Award at SC98.
http://jean-luc.aei.mpg.de/Projects/SC98/
1999:
Demonstrations at SC99 showed
distributed simulations, similar to those from 1998,
but now with more sophisticated
interactive visualization and control, using multiple
clients on the show floor.
http://jean-luc.aei.mpg.de/Projects/SC99/
2000:
With the capability to perform
distributed simulations now more routine, work is
focusing more on pushing networks and the size of
simulations to their limits, in addition to developing
portals and testbeds to bring distributed computing
to users.
2001:
Cactus was used for a 1500 processor gravitational wave
run across four machines at the NCSA and SDSC,
achieving up to 84 percent scaling. Reports from the
San Diego Supercomputing Center and the National Center for Supercomputing Applications
Distributed Computing and Cactus
Cactus can be used with the Globus Toolkit
as described in the
Globus-HOWTO. Compiling Cactus with the Globus MPICH device and adding the necessary
Globus RSL scripts, makes it possible to distribute
any simulation across several machines.
Remote visualization and steering tools, developed in
the German Gigabit Project
can be used to connect to distributed simulations for
interactive viewing of data, and steering of simulations.
Techniques are also being developed to access offline
remote data which is archived across different machines.
Current research is focussed on:
- Making the communication layer automatically adapt
to the available network to improve performance, for
example using compression and changing the number of
ghostzones used for communication.
- Finding out how to optimize the driver layers of
Cactus to fully exploit existing high speed networks.
- Pushing the collected number of machines/processors
used to the limit, with the aim of enabling extremely
large scale simulations to be performed.
- Developing the ability to deploy and execute
distributed simulations through a portal, to remove
the administrative tasks required for performing such
runs, which will speed up testing and development, and
also bring new users to distributed computing.
- Investigating techniques for dynamically querying
the state of networks, to find the best possible
configuration for a distributed simulation.
Links
- Globus Toolkit
- Slide showing result
of 1500 processor run in March 2001.
- Paper
describing latest techniques used for distributed
computing.
|