[Developers] synchronise all processors before aborting on parameter errors
Thomas Radke
tradke at aei.mpg.de
Tue Dec 4 05:25:22 CST 2007
Hi,
there exists a PARAM_CHECK bin in which thorns can schedule routines to
check the consistency of parameters and have the run stopped (using
CCTK_Abort) if there are errors.
Now Bela reported the problem that, for multi-processor simulations
using certain Infiniband MPI implementations, the run would die
prematurely because some processors call CCTK_Abort() earlier than
others, and in the logfile one cannot easily find the real reason for
the abort anymore.
Putting output buffer caching issues aside, the problem could be fixed
by inserting a CCTK_Barrier() call in the flesh function
CCTKi_FinaliseParamWarn(), just before it would check whether there were
any (local) parameter errors and then call CCTK_Abort().
I guess this small performance penalty would be acceptable ? Or does
someone have a better solution ?
--
Cheers, Thomas.
More information about the Developers
mailing list