[Users] PUGH Questions
Erik Schnetter
schnetter at cct.lsu.edu
Fri May 9 15:48:36 CDT 2008
On May 9, 2008, at 01:43:40, Andreas Schäfer wrote:
> On 22:35 Thu 08 May , Erik Schnetter wrote:
>> Yes, Carpet can handle the case where the number of processors is
>> not a power of 2.
>
> What kind of domain decomposition are you using? Space filling curves,
> ParMetis, recursive coordinate bisection, something completely
> different? I had a look at the documentation available at
> http://www.carpetcode.org/ but was too blind to find any information
> on it. ^^
PUGH decomposes the number of processors into three integer factors,
taking the number of grid points in each dimension into account. This
leads to the restrictions which Frank Löffler described earlier.
A specialised version of PUGH was indeed used some time ago in a
heterogeneous, multi-machine setup. See e.g. the section "Gordon
Bell Prize for Supercomputing (SC2001)" on <http://www.cactuscode.org/About/Prizes
>. Sadly, these updates to PUGH are not included in the main Cactus
distribution; they also hid some of the network latency.
Carpet decomposes sets of domains onto sets of processors. Carpet
decomposes the domain into cuboid regions, hence space-filling curves
would not work. It uses a hierarchical method, similar to coordinate
bisection. Instead of bisecting, Carpet n-sects, where n is chosen
depending on the number of processors. Each dimension is n-sected
only once. While bisection leads to a binary tree, n-section leads to
a wider and more shallow tree.
When the load per grid point is approximately equal, then n-section is
a simpler algorithm. If the load per grid point can vary greatly,
then n-section becomes too inflexible, and bisection would become
appropriate.
> One of my goals is to find out how
> good Cactus does perform in a multi-cluster setup, so you'd have both,
> heterogeneous machines and networks. The reason behind this is that
> we've written a parallel library for time discrete simulations on
> structured grids is geared towards grids and multi-clusters. Cactus
> and this library have some crucial functionality in common and since
> Cactus is a corner stone when it comes to computer based simulations,
Thank you for the flowers.
Do you have a pointer to web pages describing your library or your
project?
> I thought comparing the lib to Cactus would give readers a rough idea
> how to judge it.
We are looking into multi-cluster simulations, but have recently only
considered heterogeneous sets of clusters.
>> Changing the load distribution (even dynamically) is not difficult;
>> the difficult part is determining _how_ to change it, since
>> different parts of the evolution algorithm may run with differing
>> speeds on the different processors.
>
> One method we use is to hook in at the ghost zone
> communication. Assuming the following pseudo code...
>
> for each timestep
> update inner ghost zones
> send/recv ghost zones asynchronously
> update kernel
> wait for ghost zone communication
> end
>
> ...you could divide the time taken for the updates by the total time
> needed for one time step to get an estimate for a nodes
> utilization. If it is high on one node and low on its neighbors, they
> should share some of their load. Since it is not clear in advance
> which new grid point distribution leads to which new load
> distribution, a diffusion based redistribution has proven to be quite
> efficient for us.
Yes, that would be a good approach. I was referring to the fact that
one may do multi-physics simulations (we often couple spacetime and
hydrodynamics evolution, and also run some analysis tools at run
time), and one has then to find an overall balance, which is not
necessarily a good balance for each individual system.
>> I do not typically use heterogeneous machines; if you are
>> interested, we
>> could offer an API that lets people dynamically set the relative
>> performance of each MPI process, and Carpet would the change the load
>> distribution according to these performances.
>
> The way I understand it, this means that a) it could be done easily,
> but hasn't yet been a use case and b) it is not yet implemented. Is
> that right?
This is both correct. It is also (c) I find this topic interesting,
especially since the workload in different parts of the domain is
probably going to become less equally distributed in the future for
our applications in relativistic astrophysics.
You are probably aware of Ulrich Sperhake at FSU who is using Cactus
and Carpet to simulate astrophysical systems, in particular binary
black holes; see <http://www.tpi.uni-jena.de/gravity/>, or <http://arxiv.org/abs/0805.1017
> for a recent publication containing pointers describing their LEAN
code.
-erik
--
Erik Schnetter <schnetter at cct.lsu.edu> http://www.cct.lsu.edu/~eschnett/
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from www.keyserver.net.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 194 bytes
Desc: This is a digitally signed message part
Url : http://www.cactuscode.org/pipermail/users/attachments/20080509/b13cde87/attachment-0001.bin
More information about the Users
mailing list