[Developers] synchronise all processors before aborting on parameter errors
Tom Goodale
goodale at cct.lsu.edu
Tue Dec 4 05:40:02 CST 2007
On Tue, 4 Dec 2007, Thomas Radke wrote:
> Hi,
>
> there exists a PARAM_CHECK bin in which thorns can schedule routines to
> check the consistency of parameters and have the run stopped (using
> CCTK_Abort) if there are errors.
> Now Bela reported the problem that, for multi-processor simulations
> using certain Infiniband MPI implementations, the run would die
> prematurely because some processors call CCTK_Abort() earlier than
> others, and in the logfile one cannot easily find the real reason for
> the abort anymore.
>
> Putting output buffer caching issues aside, the problem could be fixed
> by inserting a CCTK_Barrier() call in the flesh function
> CCTKi_FinaliseParamWarn(), just before it would check whether there were
> any (local) parameter errors and then call CCTK_Abort().
>
> I guess this small performance penalty would be acceptable ? Or does
> someone have a better solution ?
It sounds good to me.
Tom
More information about the Developers
mailing list