[Users] PUGH Questions

Andreas Schäfer gentryx at gmx.de
Fri May 9 01:43:40 CDT 2008


Hi,

Thanks for your quick replies!

On 22:35 Thu 08 May     , Erik Schnetter wrote:
> Yes, Carpet can handle the case where the number of processors is
> not a power of 2.

What kind of domain decomposition are you using? Space filling curves,
ParMetis, recursive coordinate bisection, something completely
different? I had a look at the documentation available at
http://www.carpetcode.org/ but was too blind to find any information
on it. ^^

> What kindsw of clusters are you considering?  Is the compute hardware 
> different, or do you refer e.g. to the network topology?  Would your 
> problems be compute bound, memory bound, communication bound, or I/O bound? 
>  Do you expect the hardware performance to change at run time, or be 
> approximately constant over a run?  Or do you rather expect the 
> computational load to be different in different parts of the simulation 
> domain?

Short answer: Yes, all of them. ^^ One of my goals is to find out how
good Cactus does perform in a multi-cluster setup, so you'd have both,
heterogeneous machines and networks. The reason behind this is that
we've written a parallel library for time discrete simulations on
structured grids is geared towards grids and multi-clusters. Cactus
and this library have some crucial functionality in common and since
Cactus is a corner stone when it comes to computer based simulations,
I thought comparing the lib to Cactus would give readers a rough idea
how to judge it.

Applications could be anything, from communication bound (always ugly
on heterogeneous networks) to compute bound (hence the load balancing
question), but with IO and memory bound apps playing a minor role
(ATM). For instance one of our apps, a simulation for cooling molten
metal alloys, exhibits high load imbalances depending on the grid
points' states: we see computational hot spots on the face border
between molten and liquid metal and significantly less complex
updates in all liquid and all solid areas. Since the face border is
constantly moving as the simulation progresses and the metal
solidifies, hot spots move through the grid, new hot spots arise and
old ones disappear.

> Changing the load distribution (even dynamically) is not difficult;
> the difficult part is determining _how_ to change it, since
> different parts of the evolution algorithm may run with differing
> speeds on the different processors.

One method we use is to hook in at the ghost zone
communication. Assuming the following pseudo code...

for each timestep
  update inner ghost zones
  send/recv ghost zones asynchronously
  update kernel
  wait for ghost zone communication
end

...you could divide the time taken for the updates by the total time
needed for one time step to get an estimate for a nodes
utilization. If it is high on one node and low on its neighbors, they
should share some of their load. Since it is not clear in advance
which new grid point distribution leads to which new load
distribution, a diffusion based redistribution has proven to be quite
efficient for us.

> I do not typically use heterogeneous machines; if you are interested, we 
> could offer an API that lets people dynamically set the relative 
> performance of each MPI process, and Carpet would the change the load 
> distribution according to these performances.

The way I understand it, this means that a) it could be done easily,
but hasn't yet been a use case and b) it is not yet implemented. Is
that right?

Thanks
-Andreas


-- 
============================================
Andreas Schäfer
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
PGP/GPG key via keyserver
I'm a bright... http://www.the-brights.net
============================================

(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your 
signature to help him gain world domination!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://www.cactuscode.org/pipermail/users/attachments/20080509/4c5c494b/attachment.bin 


More information about the Users mailing list