Re: lam/mpi blcr problem

From: Jeff Squyres (jsquyres_at_lam-mpi.org)
Date: Wed Mar 23 2005 - 07:17:38 PST

  • Next message: 任明明: "Re: lam/mpi blcr problem"
    If you wouldn't mind, could you try the beta and ensure that it works 
    for you?
    
    
    On Mar 23, 2005, at 9:35 AM, 浠绘槑鏄 wrote:
    
    >
    > Thank you very much! I will wait for the new version.
    > And Thank you all.
    >
    > 脭脷脛煤碌脛脌麓脨脜脰脨脭酶戮颅脤谩碌陆:
    >> From: Jeff Squyres <jsquyres@lam-mpi.org>
    >> Reply-To:
    >> To: "$BG$L@L@(B" <0110018@mail.nankai.edu.cn>
    >> Subject: Re: lam/mpi blcr problem
    >> Date:Wed, 23 Mar 2005 09:27:56 -0500
    >>
    >> I'm sorry -- I neglected to mention in my previous e-mail that we had
    >> some problems with the logic for checkpoint/restart initialization in
    >> LAM/MPI v7.1.1.  Can you try the soon-to-be-released 7.1.2 beta?
    >>
    >> 	http://www.lam-mpi.org/beta/
    >>
    >> That should solve your problems.
    >>
    >>
    >> On Mar 23, 2005, at 9:27 AM, 脠脦脙梅脙梅 wrote:
    >>
    >>>
    >>> thank you for your help!
    >>> I can use blcr to checkpoint the non-MPI program,such as the examples
    >>> included in the blcr software.And all the nodes are ok to checkpoint 
    >>> a
    >>> non-MPI program.
    >>> but when i use cr_checkpoint to checkpoint a MPI program, it doesn't
    >>> generate
    >>> context file for each process, only generate a context file for 
    >>> mpirun
    >>> command.
    >>>
    >>> all i do is the the following:
    >>>
    >>> In one window:
    >>> ****************************************************
    >>> [rmingming@node01 lam]$ mpicc cpi.c -o cpi
    >>> [rmingming@node01 lam]$ lamboot -v nodes
    >>>
    >>> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
    >>>
    >>> n-1<8238> ssi:boot:base:linear: booting n0 (node01)
    >>> n-1<8238> ssi:boot:base:linear: booting n1 (node02)
    >>> n-1<8238> ssi:boot:base:linear: booting n2 (node03)
    >>> n-1<8238> ssi:boot:base:linear: booting n3 (node04)
    >>> n-1<8238> ssi:boot:base:linear: finished
    >>> [rmingming@node01 lam]$ mpirun C -ssi rpi crtcp -ssi cr blcr ./cpi
    >>> Process 0 on node01
    >>> Process 1 on node02
    >>> Process 3 on node04
    >>> Process 2 on node03
    >>> Enter the number of intervals: (0 quits) 0 (---during this i use
    >>> cr_checkpoint)
    >>> [rmingming@node01 lam]$
    >>>
    >>> ******************************************************
    >>>
    >>> in another window:
    >>>
    >>> ******************************************************
    >>>
    >>> [rmingming@node01 lam]$ cr_checkpoint 8248
    >>> [rmingming@node01 lam]$ ls
    >>> context.8248  cpi  cpi.c  hello.c  nodes  ring
    >>>
    >
    >
    
    -- 
    {+} Jeff Squyres
    {+} jsquyres@lam-mpi.org
    {+} http://www.lam-mpi.org/
    

  • Next message: 任明明: "Re: lam/mpi blcr problem"