[[email protected]: can blcr work well with the \'mpirun -ton .....\'?]

jcduell_at_lbl_dot_gov
Date: Fri Jan 07 2005 - 17:25:40 PST

  • Next message: Jeff Squyres: "Re: [[email protected]: can blcr work well with the \'mpirun -ton .....\'?]"
    Paul:
    
    Do you know anything about the LAM mpirun '-ton' tracing flag?  It sounds like
    jobs started with it won't restart correctly.
    
    -- 
    Jason Duell             Future Technologies Group
    <jcduell_at_lbl_dot_gov>       Computational Research Division
    Tel: +1-510-495-2354    Lawrence Berkeley National Laboratory
    
    
    ----- Forwarded message from [email protected] -----
    
    From: [email protected]
    Subject: can blcr work well with the \'mpirun -ton .....\'?
    Date: Tue, 04 Jan 2005 14:53:08 +0800 (BEIST)
    To: JCDuell_at_lbl_dot_gov
    Cc:
    X-Mailer: SkyMiracle WorldPost 8.0.1
    
    
    Dear Sir or Madam:
    
         I try to checkpoint and restart mpi programs with blcr in LAM environment !
    
         I want to checkpoint some mpi programs which are launched with the
         '-ton'  so that I can get the trace files that LAM has
         produced. After I restart the context file,  the processes such as
         mpirun, cr_restart and mpi program, have been restarted, but they
         don't continue to run. when I checkpoint the mpi programs
         without the '-ton', everything is ok !  It is so weird ! 
        can blcr work well with the "mpirun -ton ....." ?
        Thanks very much!
    
       the first commands are as followings:(with 'ton') 
           mpirun C -ton  ./ring
          cr_checkpoint   pid of mpirun
          cr_restart  context.XXXX            (restart failed, the processed have been restarted but don't continue)
    
       the second comands are as following:(without  '-ton')
            mpirun C ./ring
           cr_checkpoint   pid of mpirun
           cr_restart  context.XXXX              (restart is ok)
    
      
       redhat 9 
       the version of blcr is 0.2.2.3b8
       the lam version is 7.0.4
                                                           deward  
     
    
    ----- End forwarded message -----
    

  • Next message: Jeff Squyres: "Re: [[email protected]: can blcr work well with the \'mpirun -ton .....\'?]"