From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Thu Jan 29 2009 - 11:35:31 PST
Neal Becker wrote: > I'd like to update my blcr python module. I believe blcr API has changed a > bit since I wrote that. > > Where can I find an example and/or some doc to follow? This is intended for > very simple checkpointing - single thread, no MPI etc. > Neal, The version you emailed almost exactly 1 year ago uses current BLCR interfaces. You could update to replace cr_poll_checkpoint() with the recently added cr_poll_checkpoint_msg() if you want to get a buffer with the kernel messages on failure and/or warning. You can now pass CR_CHKPT_ASYNC_ERR in the cr_flags to delay error reporting of most request-time errors until the poll step. Other than that, I don't think anything has changed that would affect what you already have. In 0.8.0 we added library interfaces for requesting restarts, for which you may add support to your python module. To answer the original question: tests/crut.c:crut_checkpoint_to_file() is a good example of requesting a checkpoint of oneself util/cr_checkpoint/cr_checkpoint.c:real_main() is the canonical example for arguments to cr_request_checkpoint() and meanings of errno values -Paul -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group Tel: +1-510-495-2352 HPC Research Department Fax: +1-510-486-6900 Lawrence Berkeley National Laboratory