Re: programming example?

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Thu Jan 29 2009 - 11:35:31 PST

  • Next message: Karthik Gopalakrishnan: "Re: Hang in cr_restart"
    Neal Becker wrote:
    > I'd like to update my blcr python module.  I believe blcr API has changed a 
    > bit since I wrote that.
    > Where can I find an example and/or some doc to follow?  This is intended for 
    > very simple checkpointing - single thread, no MPI etc.
      The version you emailed almost exactly 1 year ago uses current BLCR 
    interfaces.  You could update to replace cr_poll_checkpoint() with the 
    recently added cr_poll_checkpoint_msg() if you want to get a buffer with 
    the kernel messages on failure and/or warning.  You can now pass 
    CR_CHKPT_ASYNC_ERR in the cr_flags to delay error reporting of most 
    request-time errors until the poll step.  Other than that, I don't think 
    anything has changed that would affect what you already have.
      In 0.8.0 we added library interfaces for requesting restarts, for 
    which you may add support to your python module.
      To answer the original question:
         tests/crut.c:crut_checkpoint_to_file() is a good example of 
    requesting a checkpoint of oneself
         util/cr_checkpoint/cr_checkpoint.c:real_main() is the canonical 
    example for arguments to cr_request_checkpoint() and meanings of errno 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     

  • Next message: Karthik Gopalakrishnan: "Re: Hang in cr_restart"