From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Feb 23 2005 - 10:36:35 PST
This is related to the other question you asked. While a call to cr_checkpoint() may indicate a failure to *save* some resource, a failure to *restore* a resource (or to continue if, for instance, one must reconnect sockets that were closed before calling cr_checkpoint()) can be indicated by a non-zero return value from the callback. As the comment indicates, we have not chosen a firm definition of the behavior in this case because we are not sure what behaviors may be useful. If you have an opinion we'd like to hear it. The current behavior is that a non-zero return from a callback will result in an error message and a call to abort(), regardless of the actual value returned. -Paul Michael Klemm wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Ulisses wrote: > | On Wed, 2005-02-23 at 16:26 +0100, Michael Klemm wrote: > |>Paul H. Hargrove wrote: > |>| Micheal, > |>| Ulisses is correct, our design uses a single callback for > |>| "checkpoint", "continue after taking checkpoint" and "restart from > |>| checkpoint". The later two cases are to be distiguished by the value > |>| returned from cr_checkpoint(). > |> > |>Ah... OK. After re-reading the headers and getting an example to work I > |>understand that fact. Afterall, I can assume that each callback > routine > |>has to call cr_checkpoint to allow BLCR to proceed to the next phase > |>during checkpoint. If just one callback denies, the whole process is > |>terminated. Right? > | > | Hmm.. I don't think so. I found in include/libcr.h (I've just > realized > | that I gave you the wrong location of libcr.h in my other e-mail, > sorry) > | the following explanation: > | > | // Note that if a callback returns without invoking cr_checkpoint(), > | // then it will be invoked immediately after the return. > > There's a comment in libcr.h that states: > > Callbacks are expected to return zero on success. A callback may return > nonzero to indicate a failure... > > > In this case I assume that the checkpoint procedure gets terminated and > no checkpoint is taken at all. > > -michael > > - -- > Computer Science Department 2, University of Erlangen-Nuremberg > Martensstrasse 3, D-91058 Erlangen, Germany > phone: ++49 (0)9131 85-28995, fax: ++49 (0)9131 85-28809 > web: http://www2.informatik.uni-erlangen.de/~klemm > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.4 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFCHK6mWEu1syWqdn0RAu3yAJ475s0DkifEtikzDA/zF7oRDw9mvACghLYN > LtKpwNuDEX8wOm38VnoYo2o= > =mVO7 > -----END PGP SIGNATURE----- -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900