From: Darin
Date: Fri Dec 17 2004 - 13:21:11 PST
Thanks Jeff -- I'm looking forward to trying it. Darin -------- from AUTHOR: Jeff Squyres-------- end AUTHOR message > On Dec 17, 2004, at 4:03 PM, jcduell_at_lbl_dot_gov wrote: > >> Our implementation requires that the MPI library handle shutting >> down >> and restoring network connections during checkpoint/restart. So >> your >> question really boils down to: is it likely that there will be a >> BLCR-enabled MPI library that runs over Myrinet? I haven't asked >> the >> LAM/MPI developers (http://www.lam-mpi.org/) whether they are >> planning >> to do this for Myrinet, and in what time frame if so, but they are >> the >> most likely candidates for the support you want--LAM already >> supports >> our stuff over TCP/IP, and LAM also works over Myrinet, so >> presumably >> they've got at least the design in place for a checkpointable >> Myrinet >> layer. > > Greetings Darin. > > Yes, we actually have BLCR-enabled GM, but it depends on the > gm_get() > Myrinet library function, which, at our last major release, was not > stable. Myricom has had several gm releases since then, and the > problems may have been fixed -- we have not had the time to test and > see. > > Check out the GM / BLCR release notes in the LAM/MPI 7.1.x release > series. LAM/MPI v7.1.2 is the most stable gm release -- it's > currently > in beta, but we plan to release the "gold" 7.1.2 Real Soon Now > (nothing > has changed with respect to GM for quite a while). See > http://www.lam-mpi.org/beta/ > > -- > {+} Jeff Squyres > {+} [email protected] > {+} http://www.lam-mpi.org/ > > -- Darin