From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Feb 22 2005 - 11:14:21 PST
Micheal, Ulisses is correct, our design uses a single callback for "checkpoint", "continue after taking checkpoint" and "restart from checkpoint". The later two cases are to be distiguished by the value returned from cr_checkpoint(). As Ulisses stated, this is described briefly in libcr.h. I am sorry to say that currently that is the only API documentation we have. Manpages or texinfo documentation for libcr is one of our current to-do items. We initially considered the 3-callback model but quickly realized that allocation of temporary memory would be a difficult problem because in the case of a signal-context callback it is not safe to call malloc. The problem of small temporaries goes away by using a single callback that can use stack variables and/or alloca() to store any information to preserve across the checkpoint and restart. Calling mmap() is recommended to allocate any large temporary spaces, due to limitations on stack size in pthreads programs. -Paul Ulisses wrote: >On Tue, 2005-02-22 at 16:16 +0100, Michael Klemm wrote: > > >>Playing around with BLCR I found that one can only register a callback >>which will be called whenever a checkpoint is requested. Thus, an >>application is supposed to be able to shutdown sockets, close files, etc. >> >>I was wondering why there is no "restart" callback which is invoked >>whenever a checkpoint returns. A closed socket has to be reopened >>somehow. The easiest way: register a restart callback that is able to >>re-establish a particular set of sockets, re-open some files, etc. >> >>The only idea (for now) to solve this problem is some kind of global >>variable that is checked each time a socket connection (or file, ...) is >>used to be able to re-open the descriptor right before the access >>occurs... I'm I right? >> >> > >Michael, > > I think you didn't quite understand what happens when you call >cr_checkpoint() inside of a callback. You can read about callbacks and >the cr_checkpoint() function in the file libcr/libcr.h (line 289). What >you have missed is that on restart-time the callback continues its >execution after the call to cr_checkpoint(), and you can know your >program is being restarted by checking the return value of cr_checkpoint >(). :-) > >Best regards, > >-- Ulisses > > > >