From: Jeff Squyres (jsquyres_at_lam-mpi.org)
Date: Fri Dec 17 2004 - 13:13:48 PST
On Dec 17, 2004, at 4:03 PM, jcduell_at_lbl_dot_gov wrote: > Our implementation requires that the MPI library handle shutting down > and restoring network connections during checkpoint/restart. So your > question really boils down to: is it likely that there will be a > BLCR-enabled MPI library that runs over Myrinet? I haven't asked the > LAM/MPI developers (http://www.lam-mpi.org/) whether they are planning > to do this for Myrinet, and in what time frame if so, but they are the > most likely candidates for the support you want--LAM already supports > our stuff over TCP/IP, and LAM also works over Myrinet, so presumably > they've got at least the design in place for a checkpointable Myrinet > layer. Greetings Darin. Yes, we actually have BLCR-enabled GM, but it depends on the gm_get() Myrinet library function, which, at our last major release, was not stable. Myricom has had several gm releases since then, and the problems may have been fixed -- we have not had the time to test and see. Check out the GM / BLCR release notes in the LAM/MPI 7.1.x release series. LAM/MPI v7.1.2 is the most stable gm release -- it's currently in beta, but we plan to release the "gold" 7.1.2 Real Soon Now (nothing has changed with respect to GM for quite a while). See http://www.lam-mpi.org/beta/ -- {+} Jeff Squyres {+} [email protected] {+} http://www.lam-mpi.org/