Re: MPI support for BLCR

Date view	Thread view	Subject view	Author view	Attachment view

jcduell_at_lbl_dot_gov
Date: Tue Feb 28 2006 - 14:05:02 PST

Next message: jcduell_at_lbl_dot_gov: "[greg_at_bronevetsky_dot_com: Re: MPI support for BLCR]"

Previous message: Greg Bronevetsky: "MPI support for BLCR"
In reply to: Greg Bronevetsky: "MPI support for BLCR"

On Tue, Feb 28, 2006 at 03:09:11PM -0500, Greg Bronevetsky wrote:
> I am a grad student at Cornell, working on checkpointing of MPI 
> applications. Our checkpointer works with any implementation of MPI and 
> (in principle) with any single-process checkpointer. However, in 
> practice integration with single process checkpointers is made more 
> complex because by default such a checkpointer will save the state of 
> the entire process, including MPI state. This is generally incorrect as 
> MPI state contains hardware information that will not be valid on restart.
> 
> I know that you've integrated BLCR with LAM, presumably in a way that 
> doesn't save LAM's state but instead lets LAM save its own state. How 
> did you do this? Was it via a special API (the callbacks referred to in 
> your FAQ) or did you use a more general technique?

The LAM team used our callback notifications to shut down all TCP (or
other network) connections, so that when our checkpoint code ran, there
was no network state that needed to be saved.  They also arrange to save
the info they need to reconnect all the processes at startup.  Finally,
they also arranged it so that using our checkpoint program on their
'mpirun' (i.e the user's initial program to start the parallel MPI job)
caused mpirun to arrange for all other processes in the MPI job to be
checkpointed before mpirun itself returned from the callback and was
checkpointed.  In sum, our code just 'sees' that a single 'mpirun'
process is to be checkpointed.  Mpirun's callback contains all the logic
that ensures each job in the parallel job is checkpointed before it
itself is checkpointed.  Restart works the same way--mpirun's restart
callback handles restarting the entire parallel job.

Needless to say, this wasn't transparent to the MPI library--they did a
lot of work to handle the parallel aspects.

It sounds like your MPI library could be made to work with BLCR if you
can write a callback that shuts down any TCP/IP connections (and does
whatever other work you normally do for a checkpoint) right before
checkpoint time, and then restores them at restart.  This is
theoretically just a matter of writing two functions--a checkpoint-time
callback, and a restart-time callback.  How easy that is depends on
whether it's easy for you to close/reopen the network state.

Does that make sense?

-- 
Jason Duell             Future Technologies Group
<jcduell_at_lbl_dot_gov>       Computational Research Division
Tel: +1-510-495-2354    Lawrence Berkeley National Laboratory

Next message: jcduell_at_lbl_dot_gov: "[greg_at_bronevetsky_dot_com: Re: MPI support for BLCR]"

Previous message: Greg Bronevetsky: "MPI support for BLCR"
In reply to: Greg Bronevetsky: "MPI support for BLCR"

Date view	Thread view	Subject view	Author view	Attachment view