Re: lam/mpi blcr problem

From: 任明明 (0110018_at_mail.nankai.edu.cn)
Date: Wed Mar 23 2005 - 06:35:35 PST

  • Next message: Jeff Squyres: "Re: lam/mpi blcr problem"
    Thank you very much! I will wait for the new version.
    And Thank you all.
    
    在您的来信中曾经提到:
    >From: Jeff Squyres <jsquyres@lam-mpi.org>
    >Reply-To: 
    >To: "$BG$L@L@(B" <0110018@mail.nankai.edu.cn>
    >Subject: Re: lam/mpi blcr problem
    >Date:Wed, 23 Mar 2005 09:27:56 -0500
    >
    >I'm sorry -- I neglected to mention in my previous e-mail that we had 
    >some problems with the logic for checkpoint/restart initialization in 
    >LAM/MPI v7.1.1.  Can you try the soon-to-be-released 7.1.2 beta?
    >
    >	http://www.lam-mpi.org/beta/
    >
    >That should solve your problems.
    >
    >
    >On Mar 23, 2005, at 9:27 AM, 任明明 wrote:
    >
    >>
    >> thank you for your help!
    >> I can use blcr to checkpoint the non-MPI program,such as the examples
    >> included in the blcr software.And all the nodes are ok to checkpoint a
    >> non-MPI program.
    >> but when i use cr_checkpoint to checkpoint a MPI program, it doesn't 
    >> generate
    >> context file for each process, only generate a context file for mpirun 
    >> command.
    >>
    >> all i do is the the following:
    >>
    >> In one window:
    >> ****************************************************
    >> [rmingming@node01 lam]$ mpicc cpi.c -o cpi
    >> [rmingming@node01 lam]$ lamboot -v nodes
    >>
    >> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
    >>
    >> n-1<8238> ssi:boot:base:linear: booting n0 (node01)
    >> n-1<8238> ssi:boot:base:linear: booting n1 (node02)
    >> n-1<8238> ssi:boot:base:linear: booting n2 (node03)
    >> n-1<8238> ssi:boot:base:linear: booting n3 (node04)
    >> n-1<8238> ssi:boot:base:linear: finished
    >> [rmingming@node01 lam]$ mpirun C -ssi rpi crtcp -ssi cr blcr ./cpi
    >> Process 0 on node01
    >> Process 1 on node02
    >> Process 3 on node04
    >> Process 2 on node03
    >> Enter the number of intervals: (0 quits) 0 (---during this i use 
    >> cr_checkpoint)
    >> [rmingming@node01 lam]$
    >>
    >> ******************************************************
    >>
    >> in another window:
    >>
    >> ******************************************************
    >>
    >> [rmingming@node01 lam]$ cr_checkpoint 8248
    >> [rmingming@node01 lam]$ ls
    >> context.8248  cpi  cpi.c  hello.c  nodes  ring
    >>
    

  • Next message: Jeff Squyres: "Re: lam/mpi blcr problem"