Re: [[email protected]: can blcr work well with the \'mpirun -ton .....\'?]

From: Jeff Squyres (
Date: Fri Jan 07 2005 - 19:31:07 PST

  • Next message: Paul H. Hargrove: "Re: Questions on BLCR.."
    Oy, yes, this might be a problem.
    -ton tells the MPI processes to dump trace information down to the LAM 
    daemons.  When the MPI processes restart, I can see how the trace 
    information would not be associated with them anymore.
    I'll check this out over the weekend and see if that works.  I kinda 
    doubt it.  I'll reply to the guy (and checkpoint@lbl) with what I find.
    Why do people keep sending LAM questions to you guys?  Is there some 
    web page that is not clear about who to send checkpoint vs. LAM 
    On Jan 7, 2005, at 8:25 PM, jcduell_at_lbl_dot_gov wrote:
    > Paul:
    > Do you know anything about the LAM mpirun '-ton' tracing flag?  It 
    > sounds like
    > jobs started with it won't restart correctly.
    > -- 
    > Jason Duell             Future Technologies Group
    > <jcduell_at_lbl_dot_gov>       Computational Research Division
    > Tel: +1-510-495-2354    Lawrence Berkeley National Laboratory
    > ----- Forwarded message from [email protected] -----
    > From: [email protected]
    > Subject: can blcr work well with the \'mpirun -ton .....\'?
    > Date: Tue, 04 Jan 2005 14:53:08 +0800 (BEIST)
    > To: JCDuell_at_lbl_dot_gov
    > Cc:
    > X-Mailer: SkyMiracle WorldPost 8.0.1
    > Dear Sir or Madam:
    >      I try to checkpoint and restart mpi programs with blcr in LAM 
    > environment !
    >      I want to checkpoint some mpi programs which are launched with the
    >      '-ton'  so that I can get the trace files that LAM has
    >      produced. After I restart the context file,  the processes such as
    >      mpirun, cr_restart and mpi program, have been restarted, but they
    >      don't continue to run. when I checkpoint the mpi programs
    >      without the '-ton', everything is ok !  It is so weird !
    >     can blcr work well with the "mpirun -ton ....." ?
    >     Thanks very much!
    >    the first commands are as followings:(with 'ton')
    >        mpirun C -ton  ./ring
    >       cr_checkpoint   pid of mpirun
    >       cr_restart  context.XXXX            (restart failed, the 
    > processed have been restarted but don't continue)
    >    the second comands are as following:(without  '-ton')
    >         mpirun C ./ring
    >        cr_checkpoint   pid of mpirun
    >        cr_restart  context.XXXX              (restart is ok)
    >    redhat 9
    >    the version of blcr is
    >    the lam version is 7.0.4
    >                                                        deward
    > ----- End forwarded message -----
    {+} Jeff Squyres
    {+} [email protected]

  • Next message: Paul H. Hargrove: "Re: Questions on BLCR.."