GordonBell2001
Gordon Bell Prize: Cactus, Globus and MPICH-G2 win top supercomputing award at SC2001 in Denver.
Cactus, Globus and MPICH-G2 Win Top Supercomputing Award
|
Thomas Dramlitsch, from
the Max Planck Institute for
Gravitational Physics
|
Members of the Cactus and Globus projects have won
one of this year's Gordon Bell Prizes in
high-performance computing for the work described in
their paper: Supporting
Efficient Execution in Heterogeneous Distributed
Computing Environments with Cactus and
Globus . The international team comprised
of Thomas Dramlitsch, Gabrielle Allen and Ed Seidel,
from the Max Planck Institute for Gravitational
Physics, along with colleagues Matei Ripeanu, Ian
Foster, Brian Toonen from the University of Chicago
and Argonne National Laboratory, and Nicholas
Karonis from Northern Illinois University. The
special category award was presented during
SC2001, a yearly conference showcasing
high-performance computing and networking, this year
held in Denver, Colorado.
The prize was awarded for the group's work on concurrently
harnessing the power of multiple supercomputers to solve
Grand Challenge problems in physics which require
substantially more resources than can be provided by a
single machine. The group enhanced the communication layer
of Cactus, a generic programming framework designed for
physicists and engineers, adding techniques capable of
dynamically adapting the code to the available network
bandwidth and latency between machines. The message
passing layer itself used MPICH-G2, a grid-enabled
implementation of the MPI protocol which handles
communications between machines separated by a wide area
network. In addition, the Globus Toolkit was used to
provide authentification and staging of simulations across
multiple machines.
|
Thomas Dramlitsch receiving the award
at SC2001 in Denver
|
In a series of experiments performed at the start of
2001, the group ran a gravitational wave simulation
across a virtual supercomputer built up from
three separate SGI Origin 2000 machines totalling
480 processors at the National Center for
Supercomputing Applications in Illinois and a 1024
processor IBM SP2 at the San Diego Supercomputing
Center in California. This virtual resource contains
differing network bandwidth between processors,
from, extremely fast (200MB/s) connections between
processors in the same machine, through Gigabit
Ethernet (125MB/s) connecting together the Origin
2000s, to the OC-12 (77MB/s) connection between the two
different sites.
However, the existence of a OC-12 connection doesn't
automatically guarantee that this entire capability
will be available to a simulation. Different
networking problems led to only achieving up to 4%
of the theoretical peak bandwidth. Computational algorithms
working across such changing networks must provide
simulations with the ability to automatically adapt
to make best use of the available bandwidth.
|
|
From the back, left to right:
Horst Simon, Brian Toonen, Nicholas
Karonis, Ed Seidel, Gabrielle Allen, Rusty
Lusk, Thomas Dramlitsch and Matei
Ripeanu.
|
The techniques developed by the group include
automatic load balancing across processors to match
the computational load with different processor
speeds, dynamically adapting the ghostzones
used for communications between processors to allow
additional computations to take the place of
expensive message passing, and making use of
data compression and message bundling to improve
communication efficiency. With these techniques
the tightly coupled simulation achieved an efficiency
of up to 88% across the sites, improving drastically
the original 14% efficiency found with the original
communication structure.
|
|
Matei Ripeanu from the
University of Chicago
|
These results and techniques are important, not
only for the possibilities they point to for large
scale simulations across multiple supercomputers,
but also for load balancing across a single machine
with heterogeneous processors as well as exploiting
cheap resources, such as idle networked workstations,
for higher throughput.
The methods described here are implemented
in a freely available module (or thorn) for
Cactus, and are being incorporated into the standard
code distribution, making efficient distributed computing
easily available for any application using the Cactus
framework.
|
Further Links
|
Created by
elena
Last modified
2007-01-21 05:18 PM