From: Jeff Squyres (jsquyres_at_lam-mpi.org)
Date: Wed Mar 23 2005 - 09:29:08 PST
It could well be that stdin is not being checkpointed. Paul? On Mar 23, 2005, at 12:18 PM, 浠绘槑鏄 wrote: > > I changed for another program which just does matrix multiplication, > this time checkpoint and restart of the MPI program worked very well. > > > 脭脷脛煤碌脛脌麓脨脜脰脨脭酶戮颅脤谩碌陆: >> From: "脠脦脙梅脙梅" <0110018@mail.nankai.edu.cn> >> Reply-To: "脠脦脙梅脙梅" <0110018@mail.nankai.edu.cn> >> To: checkpoint_at_lbl_dot_gov >> Subject: Re: lam/mpi blcr problem >> Date:Thu, 24 Mar 2005 00:33:10 +0800 >> >> >> it seems ok now, at least i can see the context files for each >> process. >> but as to my cpi program(it needs input from the first process, and i >> checkpointed it when it is waiting for the keyboard input), >> when use cr_restart, the program quits quickly. >> by the way, when use cr_checkpoint PID-of-mpirun(doesn't use --term) >> to >> this cpi example program, it quits running. I don't know what's the >> problem >> is, and wish i have expressed this problem clearly.:-) >> >> Thank you for your valuable information. >> >> >> 脭脷脛煤碌脛脌麓脨脜脰脨脭酶戮颅脤谩碌陆: >>> From: "脠脦脙梅脙梅" <0110018@mail.nankai.edu.cn> >>> Reply-To: "脠脦脙梅脙梅" <0110018@mail.nankai.edu.cn> >>> To: checkpoint_at_lbl_dot_gov >>> Subject: Re: lam/mpi blcr problem >>> Date:Wed, 23 Mar 2005 23:34:11 +0800 >>> >>> >>> I will, I will use this version: >>> >>> http://www.lam-mpi.org/download/files/lam-7.1.2b18.tar.bz2 >>> >>> 脭脷脛煤碌脛脌麓脨脜脰脨脭酶戮颅脤谩碌陆: >>>> From: Jeff Squyres <jsquyres@lam-mpi.org> >>>> Reply-To: >>>> To: "$BG$L@L@(B" <0110018@mail.nankai.edu.cn> >>>> Subject: Re: lam/mpi blcr problem >>>> Date:Wed, 23 Mar 2005 10:17:38 -0500 >>>> >>>> If you wouldn't mind, could you try the beta and ensure that it >>>> works >>>> for you? >>>> >>>> >>>> On Mar 23, 2005, at 9:35 AM, 脠脦脙梅脙梅 wrote: >>>> >>>>> >>>>> Thank you very much! I will wait for the new version. >>>>> And Thank you all. >>>>> >>>> >>> >>> >>> >>> >> >> >> > > -- {+} Jeff Squyres {+} jsquyres@lam-mpi.org {+} http://www.lam-mpi.org/