Re: Checkpointing to a socket

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Dec 01 2004 - 14:38:14 PST

  • Next message: Paul H. Hargrove: "Re: Checkpointing to a socket"
       I think something like the following might work:
    len = snprintf(cmd_buffer, cmd_buffer_len, "cr_checkpoint --clobber 
    --file /proc/self/fd/%d --pid %d", socket_fd, target_pid);
    if (len >= cmd_buffer_len) {
       /* cmd_buffer too small, deal with it */
    rc = system(cmd_buffer);
    The use of /proc/self/fd/%d will cause the checkpoint to go to an 
    existing open file descriptor.  The --clobber is needed to ensure it 
    goes straight there rather than going to a temporary file that is then 
    renamed to the destination (which would fail).  Note that this will fail 
    if the socket descriptor is close-on-exec.
    I think it would be easy to implement "--fd <N>" to get the same 
    behavior as "--clobber --file /proc/self/fd/<N>".  In fact, I've added a 
    request for this feature to our bug database 
    ( as bug #882.
    JCDuell_at_lbl_dot_gov wrote:
    > Zoltan:
    > We have not tested checkpointing to a socket, but it should work.  We
    > are planning to move most of the logic currently in cr_checkpoint into a
    > set of library routines, which will allow a socket file descriptor to be
    > passed in instead of a regular file.  In the meantime, a relatively
    > small amount of hacking in cr_checkpoint.c could allow you to do the
    > same thing--just add a '--socket <address>' flag, and have it open up a
    > TCP socket instead of a file.
    > Cheers,
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

  • Next message: Paul H. Hargrove: "Re: Checkpointing to a socket"