From: Jeff Squyres (jsquyres_at_lam-mpi.org)
Date: Tue Mar 22 2005 - 12:23:46 PST
On Mar 22, 2005, at 12:05 PM, Paul H. Hargrove wrote: > I am sorry to hear that you are having problems. Lets see if we can > help. > > As far as I can tell your LAM configuration is OK, but I am cc:ing > this to one of the LAM developers who may be able to spot something I > could not. No need -- I'm actually on the checkpoint_at_lbl_dot_gov list. :-) > Have you tried 'make check' in the blcr build directory or > checkpointing/restarting some of the non-mpi examples in blcr's > examples directory? It would be good to know that the blcr build was > OK before bring LAM into the mix. > > When LAM ran the mpi application, was blcr installed (and the kernel > modules loaded) on all the compute nodes running the mpi job? Additionally, were you using the crtcp RPI? I.e., what was the specific command that you used to mpirun your application? And how did you try to checkpoint it? -- {+} Jeff Squyres {+} [email protected] {+} http://www.lam-mpi.org/