Re: blcr for mpi/myrinet jobs?

From: Darin
Date: Fri Dec 17 2004 - 13:21:11 PST

  • Next message: Lip Kian: "RE: checkpoint of scripts and error codes"
    Thanks Jeff -- I'm looking forward to trying it.
    
    Darin
    
    -------- from AUTHOR: Jeff Squyres-------- end AUTHOR message
    > On Dec 17, 2004, at 4:03 PM, jcduell_at_lbl_dot_gov wrote:
    >
    >> Our implementation requires that the MPI library handle shutting
    >> down
    >> and restoring network connections during checkpoint/restart.  So
    >> your
    >> question really boils down to:  is it likely that there will be a
    >> BLCR-enabled MPI library that runs over Myrinet?  I haven't asked
    >> the
    >> LAM/MPI developers (http://www.lam-mpi.org/) whether they are
    >> planning
    >> to do this for Myrinet, and in what time frame if so, but they are
    >> the
    >> most likely candidates for the support you want--LAM already
    >> supports
    >> our stuff over TCP/IP, and LAM also works over Myrinet, so
    >> presumably
    >> they've got at least the design in place for a checkpointable
    >> Myrinet
    >> layer.
    >
    > Greetings Darin.
    >
    > Yes, we actually have BLCR-enabled GM, but it depends on the
    > gm_get()
    > Myrinet library function, which, at our last major release, was not
    > stable.  Myricom has had several gm releases since then, and the
    > problems may have been fixed -- we have not had the time to test and
    > see.
    >
    > Check out the GM / BLCR release notes in the LAM/MPI 7.1.x release
    > series.  LAM/MPI v7.1.2 is the most stable gm release -- it's
    > currently
    > in beta, but we plan to release the "gold" 7.1.2 Real Soon Now
    > (nothing
    > has changed with respect to GM for quite a while).  See
    > http://www.lam-mpi.org/beta/
    >
    > --
    > {+} Jeff Squyres
    > {+} [email protected]
    > {+} http://www.lam-mpi.org/
    >
    >
    
    
    -- 
    Darin 
    

  • Next message: Lip Kian: "RE: checkpoint of scripts and error codes"