[Users] PUGH Questions
Andreas Schäfer
gentryx at gmx.de
Sat May 10 11:38:21 CDT 2008
Heya,
On 15:48 Fri 09 May , Erik Schnetter wrote:
> A specialised version of PUGH was indeed used some time ago in a
> heterogeneous, multi-machine setup. See e.g. the section "Gordon Bell
> Prize for Supercomputing (SC2001)" on
> <http://www.cactuscode.org/About/Prizes>. Sadly, these updates to PUGH are
> not included in the main Cactus distribution; they also hid some of the
> network latency.
Very interesting! Somehow I must have missed this paper. I'm a bit
curious about the proposed domain decomposition. In Fig. 2 the
processor topology seems to form a lengthy cuboid of the dimensions
5x12x25, with subsystems being split at the Z axis. Isn't this prone
to degenerated surfaces? If the simulation grid was equally spaced in
all dimensions, then the smaller systems of 5x12x2 CPUs would have the
same surface as the larger ones, leading to an increased "outside
communication" to load ratio.
We're currently working on structured grids only (just like PUGH?),
and the problems above led us to consider different domain
decomposition schemes. The Z curve seems to be quite good at volume to
surface ratios, and graceful in terms of load balancing (see below for
the problems we had with bisection), but we're digging into this just
now and don't have final results yet.
> Carpet n-sects, where n is chosen depending on the
> number of processors. Each dimension is n-sected only once. While
> bisection leads to a binary tree, n-section leads to a wider and more
> shallow tree.
How exactly do you choose n? Does n have to be a divisor of the number
of processors? If you n-sec each dimension only once, couldn't this
also lead to a degenerated surface to volume ratio?
I'm not sure how "each dimension is n-sected only once" affects this,
but using bisections we've observed a nasty effect during load
balancing that we call "flip over": Assume we had four machines and
would like to do a weighted bisection with the weights [0.51, 0.25,
0.12, 0.12]. Then we would end up with a bisection equivalent to the
following: (assuming that we start with a 1x1 square and always split
along the longest axis, numbers correspond to the machine's position
in the weight vector)
--------------
| | | |
| | 2|3 |
| | | |
| 0 -------
| | |
| | 1 |
| | |
--------------
(please forgive me my ASCII art)
Now, if we then decided to change the weights slightly to [0.49, 0.25,
0.13, 0.13] (e.g. because we observed a load imbalance), we'd end up
with the following bisection...
--------------
| | 2 |
| -------
| | 3 |
| 0 -------
| | |
| | 1 |
| | |
--------------
...which requires the nodes 2 and 3 to communicate ~50% of their share
of the simulation grid. That's quite expensive for such a small change
in the weights vector.
Of course it could be argued that the chosen example with its strange
weights is rather arcane, but it's easy to see that a) it could be
quickly generalized to one with more nodes and an almost even weight
distribution, b) it could hit not only two single nodes, but even
whole groups of nodes where some nodes would have to exchange 100% of
their grid region, and c) there are similar cases for n-sections (when
dimensions are divided multiple times). Actually, these flip overs
are most common for setups with many homogeneous nodes (a power of
two, preferably).
Could the example above happen when balancing the load with Carpet? If
not, how does the n-section work exactly?
> Do you have a pointer to web pages describing your library or your project?
Not yet, I'm currently writing a paper for the ParSim (at the
EuroPVM/MPI) and we'll put up the code with some docs later on, but
the paper comes first. ^^
> Yes, that would be a good approach. I was referring to the fact that one
> may do multi-physics simulations (we often couple spacetime and
> hydrodynamics evolution, and also run some analysis tools at run time), and
> one has then to find an overall balance, which is not necessarily a good
> balance for each individual system.
Ok, I get it.
> You are probably aware of Ulrich Sperhake at FSU who is using Cactus and
> Carpet to simulate astrophysical systems, in particular binary black holes;
Aware yes, although we're not currently working with them. Maybe this
will change in the course of Jena's campus grid. We'll see. (-8
Cheers
-Andreas
--
============================================
Andreas Schäfer
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
PGP/GPG key via keyserver
I'm a bright... http://www.the-brights.net
============================================
(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://www.cactuscode.org/pipermail/users/attachments/20080510/215be427/attachment.bin
More information about the Users
mailing list