Thanks all,<br><br><br><br><div><span class="gmail_quote">2008/4/1, Erik Schnetter <<a href="mailto:schnetter@cct.lsu.edu">schnetter@cct.lsu.edu</a>>:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br> how large is the cluster? A factor of 2 is not really bad, since you<br> still get a factor of 32 speedup compared to running on a single CPU.<br> What is the slowdown when you go from using 1 to using 2 full nodes?</blockquote>
<div><br> </div>I can use max 80 cpus. If the factor of 2 is not really bad and unavoidable to a low performace network cluster, I think I'd better stop here. I spent too much time on this ^^<br><br>Anyway I got many commnets and suggestions from mpich-discuss forum. I found our switch has a good bandwidth value even though its latency is not good as high performace hardwares. Also it equips with the ability to use Open-MX. I will test Open-MX to reduce the latency.<br>
<br>The numbers below taken from mpich2. Note the iteration number was taken 32, a half of the previous one.<br><br>Thanks!<br><br>Hee Il<br><br><br>==========================<br><br><div># 1 node = 8 cpus<br><br>./CCTK_Proc0.out: | Total time for simulation | 463.46191400 | 423.44646400 <br>
./CCTK_Proc1.out: | Total time for simulation | 463.41564800 | 443.49971700 <br>./CCTK_Proc2.out: | Total time for simulation | 463.41577600 | 441.01956200 <br>
./CCTK_Proc3.out: | Total time for simulation | 463.41576900 | 415.26195200 <br>./CCTK_Proc4.out: | Total time for simulation | 463.41564200 | 444.48777900 <br>
./CCTK_Proc5.out: | Total time for simulation | 463.41567800 | 421.04631400 <br>./CCTK_Proc6.out: | Total time for simulation | 463.41577200 | 421.11031800 <br>
./CCTK_Proc7.out: | Total time for simulation | 463.44232100 | 411.46171500 <br><br><br># 2 node = 16 cpus<br><br>./CCTK_Proc0.out: | Total time for simulation | 481.08626400 | 439.36345800 <br>
./CCTK_Proc10.out: | Total time for simulation | 481.05228700 | 449.02006200 <br>./CCTK_Proc11.out: | Total time for simulation | 481.05252200 | 423.33445700 <br>
./CCTK_Proc12.out: | Total time for simulation | 481.05242800 | 444.17976000 <br>./CCTK_Proc13.out: | Total time for simulation | 481.05249500 | 415.08594100 <br>
./CCTK_Proc14.out: | Total time for simulation | 481.05234400 | 413.60184900 <br>./CCTK_Proc15.out: | Total time for simulation | 481.05244200 | 407.84548900 <br>
./CCTK_Proc1.out: | Total time for simulation | 481.05222500 | 415.46996500 <br>./CCTK_Proc2.out: | Total time for simulation | 481.05224300 | 415.90599200 <br>
./CCTK_Proc3.out: | Total time for simulation | 481.05421800 | 404.89330400 <br>./CCTK_Proc4.out: | Total time for simulation | 481.05222600 | 446.89592900 <br>
./CCTK_Proc5.out: | Total time for simulation | 481.05237600 | 419.93424400 <br>./CCTK_Proc6.out: | Total time for simulation | 481.04626200 | 423.71448100 <br>
./CCTK_Proc7.out: | Total time for simulation | 481.09418200 | 419.81023700 <br>./CCTK_Proc8.out: | Total time for simulation | 481.05225000 | 430.81092400 <br>
./CCTK_Proc9.out: | Total time for simulation | 481.05227900 | 448.58403400 <br><br><br># 4 nodes = 32 cpus<br><br>./CCTK_Proc0.out: | Total time for simulation | 688.29916500 | 415.56597200<br>
max 460.74879500<br><br># 8 nodes = 64 cpus<br><br>./CCTK_Proc0.out: | Total time for simulation | 794.68444700 | 428.21476200<br>
max 470.84142600<br><br><br><br><br><br><br><br> </div><br></div>