Re: Can we directly call system ?

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Fri Dec 28 2007 - 16:29:44 PST

  • Next message: Jerry Mersel: "Re: berkeley checkpointing"
    王磊 wrote:
    > Dear  Sir ,
    >
    > Yesterday , I wrote to you for help ,today I have another question .
    >
    > I guess if I can directly call " cr_restart" in system() function ? In 
    > my program there is some problems .
    >
    > The form of my program is as follows :
    >   
    >                     ................
    >                    checkpoint_func() ; //place1 , successfully ,and I 
    > have a checkpoint file  named  as " filename"
    >                    ................
    >                    ...............
    >                   system( " cr_restart  filename") ;//I want to 
    > imitate  the command-line form to let it go to place1
    >
    >                    .............
    > but , the error told me that " open (filename, O_RDONLY): No Such file 
    > or directory "
    >
    > So , I write another program to seperately use system() function ,it 
    > meets my request.
    >
    > I can not understand , would you help me ?
    >
    > Thanks !
    >
    > Daniel.
    
    Daniel,
      Sorry for the slow response - this is a vacation week for most of us.
      Your use of system() is the correct approach - there is NOT yet any 
    cr_request_restart() type of function in our library because the 
    interface is still changing.  For now, use of system() is the only 
    recommended way.
      As to why you get "No such file or directory" I can only guess that 
    the file is not where you expect it, or that your program's cwd is not 
    the same as when the file was created.  You might want to check if 
    system("ls -l filename") works or not when called exactly where you have 
    system("cr_restart filename") now. 
      Aside from the "No such file or directory" error, you are probably not 
    going to be able to do what you are trying (a program requesting its own 
    restart) because BLCR can't restart a process when its PID is already in 
    use, and doesn't currently have a mechanism to say "replace *this* 
    process with the restarted one".  If the program is to restart itself, 
    it will need to fork a child to request the restart, and will itself 
    need to exit before the restart can succeed.  Something like the 
    following (compiled but not tested for an actual restart) is probably 
    what you want.  This function will return only if there is a failure 
    prior to the fork().
    
    -Paul
    
    #include <unistd.h>
    #include <errno.h>
    #include <stdio.h>
    #include <stdlib.h>
    
    int restart_self(const char *filename) {
        int pid;
        int fds[2];
        int rc;
    
        /* Create pipe to allow child to wait for parent's death */
        rc = pipe(fds);
        if (rc < 0) {
            perror("pipe() call failed");
            return -1;
        }
    
        /* Fork a child */
        pid = fork();
        if (pid < 0) {
            perror("fork() call failed");
            return -1;
        } else if (pid > 0) {
            /* In parent - just exit */
            exit(0);
        }
    
        /* In child - wait for parent's death (EOF on pipe) */
        close(fds[1]);
        do {
            char dummy;
            rc = read(fds[0], &dummy, 1);
        } while ((rc < 0) && (errno == EINTR));
        if (rc != 0) {
            perror("read() call failed in an unexpected way");
            exit(1);
        }
    
        /* Restart */
        rc = execlp("cr_restart", "cr_restart", filename, NULL);
        perror("exec() returned!");
        exit(1);
    }
    
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    

  • Next message: Jerry Mersel: "Re: berkeley checkpointing"