Re: blcr for mpi/myrinet jobs?

From: Jeff Squyres (jsquyres_at_lam-mpi.org)
Date: Fri Dec 17 2004 - 13:13:48 PST

  • Next message: Darin Ernst: "Re: blcr for mpi/myrinet jobs?"
    On Dec 17, 2004, at 4:03 PM, jcduell_at_lbl_dot_gov wrote:
    
    > Our implementation requires that the MPI library handle shutting down
    > and restoring network connections during checkpoint/restart.  So your
    > question really boils down to:  is it likely that there will be a
    > BLCR-enabled MPI library that runs over Myrinet?  I haven't asked the
    > LAM/MPI developers (http://www.lam-mpi.org/) whether they are planning
    > to do this for Myrinet, and in what time frame if so, but they are the
    > most likely candidates for the support you want--LAM already supports
    > our stuff over TCP/IP, and LAM also works over Myrinet, so presumably
    > they've got at least the design in place for a checkpointable Myrinet
    > layer.
    
    Greetings Darin.
    
    Yes, we actually have BLCR-enabled GM, but it depends on the gm_get() 
    Myrinet library function, which, at our last major release, was not 
    stable.  Myricom has had several gm releases since then, and the 
    problems may have been fixed -- we have not had the time to test and 
    see.
    
    Check out the GM / BLCR release notes in the LAM/MPI 7.1.x release 
    series.  LAM/MPI v7.1.2 is the most stable gm release -- it's currently 
    in beta, but we plan to release the "gold" 7.1.2 Real Soon Now (nothing 
    has changed with respect to GM for quite a while).  See 
    http://www.lam-mpi.org/beta/
    
    -- 
    {+} Jeff Squyres
    {+} [email protected]
    {+} http://www.lam-mpi.org/
    

  • Next message: Darin Ernst: "Re: blcr for mpi/myrinet jobs?"