From: Jeff Squyres (jsquyres_at_lam-mpi.org)
Date: Wed Mar 23 2005 - 09:29:08 PST
It could well be that stdin is not being checkpointed. Paul? On Mar 23, 2005, at 12:18 PM, 任明明 wrote: > > I changed for another program which just does matrix multiplication, > this time checkpoint and restart of the MPI program worked very well. > > > ÔÚÄúµÄÀ´ÐÅÖÐÔø¾Ìáµ½: >> From: "ÈÎÃ÷Ã÷" <[email protected]> >> Reply-To: "ÈÎÃ÷Ã÷" <[email protected]> >> To: checkpoint_at_lbl_dot_gov >> Subject: Re: lam/mpi blcr problem >> Date:Thu, 24 Mar 2005 00:33:10 +0800 >> >> >> it seems ok now, at least i can see the context files for each >> process. >> but as to my cpi program(it needs input from the first process, and i >> checkpointed it when it is waiting for the keyboard input), >> when use cr_restart, the program quits quickly. >> by the way, when use cr_checkpoint PID-of-mpirun(doesn't use --term) >> to >> this cpi example program, it quits running. I don't know what's the >> problem >> is, and wish i have expressed this problem clearly.:-) >> >> Thank you for your valuable information. >> >> >> ÔÚÄúµÄÀ´ÐÅÖÐÔø¾Ìáµ½: >>> From: "ÈÎÃ÷Ã÷" <[email protected]> >>> Reply-To: "ÈÎÃ÷Ã÷" <[email protected]> >>> To: checkpoint_at_lbl_dot_gov >>> Subject: Re: lam/mpi blcr problem >>> Date:Wed, 23 Mar 2005 23:34:11 +0800 >>> >>> >>> I will, I will use this version: >>> >>> http://www.lam-mpi.org/download/files/lam-7.1.2b18.tar.bz2 >>> >>> ÔÚÄúµÄÀ´ÐÅÖÐÔø¾Ìáµ½: >>>> From: Jeff Squyres <[email protected]> >>>> Reply-To: >>>> To: "$BG$L@L@(B" <[email protected]> >>>> Subject: Re: lam/mpi blcr problem >>>> Date:Wed, 23 Mar 2005 10:17:38 -0500 >>>> >>>> If you wouldn't mind, could you try the beta and ensure that it >>>> works >>>> for you? >>>> >>>> >>>> On Mar 23, 2005, at 9:35 AM, ÈÎÃ÷Ã÷ wrote: >>>> >>>>> >>>>> Thank you very much! I will wait for the new version. >>>>> And Thank you all. >>>>> >>>> >>> >>> >>> >>> >> >> >> > > -- {+} Jeff Squyres {+} [email protected] {+} http://www.lam-mpi.org/