Re: Callbacks

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Feb 23 2005 - 10:36:35 PST

  • Next message: Michael Klemm: "Bug Report: Stale process after abortion of restart process"
    This is related to the other question you asked.  While a call to 
    cr_checkpoint() may indicate a failure to *save* some resource, a 
    failure to *restore* a resource (or to continue if, for instance, one 
    must reconnect sockets that were closed before calling cr_checkpoint()) 
    can be indicated by a non-zero return value from the callback.
    As the comment indicates, we have not chosen a firm definition of the 
    behavior in this case because we are not sure what behaviors may be 
    useful.  If you have an opinion we'd like to hear it.  The current 
    behavior is that a non-zero return from a callback will result in an 
    error message and a call to abort(), regardless of the actual value 
    Michael Klemm wrote:
    > Hash: SHA1
    > Ulisses wrote:
    > | On Wed, 2005-02-23 at 16:26 +0100, Michael Klemm wrote:
    > |>Paul H. Hargrove wrote:
    > |>| Micheal,
    > |>|   Ulisses is correct, our design uses a single callback for
    > |>| "checkpoint", "continue after taking checkpoint" and "restart from
    > |>| checkpoint".  The later two cases are to be distiguished by the value
    > |>| returned from cr_checkpoint().
    > |>
    > |>Ah... OK. After re-reading the headers and getting an example to work I
    > |>understand that fact.  Afterall, I can assume that each callback 
    > routine
    > |>has to call cr_checkpoint to allow BLCR to proceed to the next phase
    > |>during checkpoint. If just one callback denies, the whole process is
    > |>terminated. Right?
    > |
    > |     Hmm.. I don't think so. I found in include/libcr.h (I've just 
    > realized
    > | that I gave you the wrong location of libcr.h in my other e-mail, 
    > sorry)
    > | the following explanation:
    > |
    > | // Note that if a callback returns without invoking cr_checkpoint(),
    > | // then it will be invoked immediately after the return.
    > There's a comment in libcr.h that states:
    > Callbacks are expected to return zero on success. A callback may return
    > nonzero to indicate a failure...
    > In this case I assume that the checkpoint procedure gets terminated and
    > no checkpoint is taken at all.
    >     -michael
    > - --
    > Computer Science Department 2, University of Erlangen-Nuremberg
    > Martensstrasse 3, D-91058 Erlangen, Germany
    > phone: ++49 (0)9131 85-28995, fax: ++49 (0)9131 85-28809
    > web:
    > -----BEGIN PGP SIGNATURE-----
    > Version: GnuPG v1.2.4 (GNU/Linux)
    > Comment: Using GnuPG with Mozilla -
    > iD8DBQFCHK6mWEu1syWqdn0RAu3yAJ475s0DkifEtikzDA/zF7oRDw9mvACghLYN
    > LtKpwNuDEX8wOm38VnoYo2o=
    > =mVO7
    > -----END PGP SIGNATURE-----
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

  • Next message: Michael Klemm: "Bug Report: Stale process after abortion of restart process"