Re: PIDs in Checkpoint/Restart

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Mar 09 2005 - 10:17:59 PST

  • Next message: Jeff Squyres: "Re: [ can blcr work well with the \'mpirun -ton .....\'?]"
    Before restoring any processes, BLCR checks that the required PIDs are 
    currently unused.  If there is any conflict, then the restart attempt 
    will fail and cr_restart will return an exit code of EBUSY (Resource 
    busy).  In the future we may be able to provide information about the 
    specific conflicting PIDs.
    By the way, the PPID is restored for processes other than the "eldest" 
    which ends up as a child of the cr_restart command.
    Heiko Bauke wrote:
    > Dear BLCR developer,
    > the BLCR web page claims: BLCR restores the process ID (PID), thread
    > group ID (TGID), parent process ID (PPID), and process tree to old
    > state.
    > How does this work? Is this feature reliable?
    > Under Linux a PID is less than 2^31 and PIDs may be recycled. So let's
    > consider the following situation. A program with a PID of 42 was
    > checkpointed and suspended. Because the process is stopped, his PID may
    > be recycled. So what will happen, when BLCR restarts the suspended
    > process but its former PID 42 is now owned by another process?
    > 	with regards
    > 	Heiko
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

  • Next message: Jeff Squyres: "Re: [ can blcr work well with the \'mpirun -ton .....\'?]"