[Developers] synchronise all processors before aborting on parameter errors

Erik Schnetter schnetter at cct.lsu.edu
Tue Dec 4 10:58:00 CST 2007


On Dec 4, 2007, at 05:25:22, Thomas Radke wrote:

> Hi,
>
> there exists a PARAM_CHECK bin in which thorns can schedule  
> routines to
> check the consistency of parameters and have the run stopped (using
> CCTK_Abort) if there are errors.
> Now Bela reported the problem that, for multi-processor simulations
> using certain Infiniband MPI implementations, the run would die
> prematurely because some processors call CCTK_Abort() earlier than
> others, and in the logfile one cannot easily find the real reason for
> the abort anymore.
>
> Putting output buffer caching issues aside, the problem could be fixed
> by inserting a CCTK_Barrier() call in the flesh function
> CCTKi_FinaliseParamWarn(), just before it would check whether there  
> were
> any (local) parameter errors and then call CCTK_Abort().
>
> I guess this small performance penalty would be acceptable ? Or does
> someone have a better solution ?


In addition to this good idea, we could insert a sleep(10) in  
CCTK_Abort, so that other processors have a bit of time to catch up  
before aborting.  This should often be enough for them to produce  
some additional debug output.  (I'm thinking of a new parameter INT  
sleep_time_before_abort, with a default value of 10 or so.)

-erik

-- 
Erik Schnetter <schnetter at cct.lsu.edu>

My email is as private as my paper mail.  I therefore support encrypting
and signing email messages.  Get my PGP key from www.keyserver.net.



-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://www.cactuscode.org/pipermail/developers/attachments/20071204/7d7e2b62/attachment.bin 


More information about the Developers mailing list