[Users] Checkpointing issue.

Thomas Radke tradke at aei.mpg.de
Fri Apr 18 12:17:40 CDT 2008


Peter Diener wrote:
> Hi,
> 
> Sorry about the long e-mail.
> 
> I just saw the following weird behaviour when restarting from a set of 
> Carpet mesh refinementcheckpoint files. I was running with 9 refinementl 
> levels on 128 MPI-processes and was restarting on the same number of 
> processes when it seemed like the restart stalled when reading in 
> refinement level 5. 
 >
> The whole process took almost 7 hours with almost all of the time spent on 
> reflevel 5 opening 98 files before finding the right data.
> 
> So it seems that for some reason all refinement levels, except for level 
> 5, where distributed in the same way in the original run and the restart 
> run. What used to be on MPI process 0 on level 5 was suddenly on MPI 
> process 98. I don't have output for the other MPI processes, so I don't 
> know how much was moved around...
> 
> Has anybody seen something like this before?
> 
> Any suggestions as to what to do about it?

Hi Peter,

I don't know why the grid structure for refinement level 5 would be 
different in the recovery run but not for other levels. I have seen this 
behaviour before though but couldn't find out why Carpet was doing that.

-- 
Cheers, Thomas.


More information about the Users mailing list