[Users] Checkpointing issue.
Thomas Radke
tradke at aei.mpg.de
Fri Apr 18 12:17:40 CDT 2008
Peter Diener wrote:
> Hi,
>
> Sorry about the long e-mail.
>
> I just saw the following weird behaviour when restarting from a set of
> Carpet mesh refinementcheckpoint files. I was running with 9 refinementl
> levels on 128 MPI-processes and was restarting on the same number of
> processes when it seemed like the restart stalled when reading in
> refinement level 5.
>
> The whole process took almost 7 hours with almost all of the time spent on
> reflevel 5 opening 98 files before finding the right data.
>
> So it seems that for some reason all refinement levels, except for level
> 5, where distributed in the same way in the original run and the restart
> run. What used to be on MPI process 0 on level 5 was suddenly on MPI
> process 98. I don't have output for the other MPI processes, so I don't
> know how much was moved around...
>
> Has anybody seen something like this before?
>
> Any suggestions as to what to do about it?
Hi Peter,
I don't know why the grid structure for refinement level 5 would be
different in the recovery run but not for other levels. I have seen this
behaviour before though but couldn't find out why Carpet was doing that.
--
Cheers, Thomas.
More information about the Users
mailing list