Re: Problems with --enable-restore-ids

From: Eric Roman (ESRoman_at_berkeley_dot_edu)
Date: Thu Feb 19 2009 - 11:31:34 PST

  • Next message: Hongjia Cao: "program segfault after restart"
    This might not be related, but I had a few cases where MOM was leaking file
    descriptors down to a job when the job was created.  BLCR would save those
    descriptors, and then a subsequent restart would fail.  The reason being that
    at restart time, BLCR would fail to reopen these descriptors (dead pipes)
    and try to attach them to the standard input of the job, which would fail,
    because BLCR didn't have permission to open the restarted jobs standard
    input.
    
    I'm not sure if this is the same bug or not, but I added some code to MOM
    that would try to close all of the fd's before a new job was spawned.  (I
    thought that this was in the released version.)
    
    echo ls -l /proc/self/fd | qsub
    
    Check the open descriptor numbers against MOM and see if they match
    
    sudo ls -l /proc/PUT MOM'S PID HERE/fd
    
    Another easy thing you could try is redirecting the job's input from /dev/null
    during the cr_restart in the blcr_restart_script that Torque provides.  That's
    part of what I did to work through this.
    
    Or you could try closing all of your file descriptors there with brute force.
    for (i=3; i<maxfd; ++i) close(i); 
    
    I added some lines to MOM before exec()'ing the blcr_restart_script that read:
    
    
    +
    +        /* 
    +         * close all file descriptors so BLCR doesn't recover them
    +         */
    +
    +        /* MOM's log */
    +        log_close(0);
    +
    +        /* Now the lock file */
    +        if (lockfds >= 0)
    +          {
    +          close(lockfds);
    +    
    +          lockfds = -1;
    +          }
    +
    +        /* open sockets */
    +        net_close(-1);
    +
    +        /* replace stdin, stdout, and stderr with /dev/null */
    +        fdreopen("/dev/null", O_RDONLY, 0);
    +        for (i=1; i<=2; ++i) {
    +          fdreopen("/dev/null", O_WRONLY, i);
    +        }
    
    Maybe there isn't a corresponding net_close(), log_close(), or close(lockfds)
    when the job is started?
    
    Anyway, just some info that might help.
    
    Eric
    
    On Thu, Feb 19, 2009 at 10:47:33AM -0800, Ted Cabeen wrote:
    > In this case, I am running BLCR with torque, so I don't have direct 
    > access to exactly what filehandles torque has open.  Looking in the 
    > /proc filesystem when the job is running (not checkpointed), there are 
    > three processes, all of which have a fd 16 pointing at the same pipe:
    > 29301/fd/16 -> pipe:[249689]
    > 29302/fd/16 -> pipe:[249689]
    > 29304/fd/16 -> pipe:[249689]
    > 
    > Those three processes map to the user's shell, the copy of sh running 
    > the user's job script, and the active process of the job (in this case, 
    > just a sleep for testing).  Is that helpful?
    > 
    > --Ted
    > 
    > 
    > Paul H. Hargrove wrote:
    > >Ted,
    > > Thanks for your patience.  The restore-ids code itself seems to be 
    > >doing just what it should: dropping the root privilege before performing 
    > >any fs permission checks, preventing use of a maliciously modified 
    > >checkpoint context file as a way to access otherwise inaccessible 
    > >files.  However, there seem to be problems with files (ncsd is just one 
    > >example) that were originally openned *with* some privilege.  Anything 
    > >setup by the batch system is a candidate for such problems.
    > >
    > > For the new problem I can see the problem, but don't know enough to 
    > >suggest a solution.
    > > The term "external pipe" means that at the time the checkpoint was 
    > >taken there was a pipe that had only one endpoint within the scope of 
    > >the checkpoint, while the other was not.  In a batch scheduled 
    > >environment this is often the case for the std{in,out,err} descriptors, 
    > >but in your case the error says fd=16, so it must be something else.
    > > When an "external pipe" is encountered in the context file at restart 
    > >time, BLCR's behavior is to try to connect this fd to same file as the 
    > >stdin or stdout (depending on which end of the pipe is external) of the 
    > >cr_restart process.  In your case the user has insufficient permission 
    > >to do so, most likely because the cr_restart was launched as root and 
    > >root owns the file (or device) that are used for std{in,out}.
    > >
    > > Since I don't know what fd 16 was being used for,  I can't be certain 
    > >that connecting it to stdin or stdout is even the right thing to do.  If 
    > >it is the right thing, then I will need to go back to looking at the 
    > >BLCR source code and determine if the permission checks being performed 
    > >(the ones that yield the first error) are required for 
    > >correctness/security.  My initial thought is that if the stdin/out of 
    > >cr_restart have not been marked close-on-exec, then any child it creates 
    > >potentially has access to those fds as its own stdin/out and bypassing 
    > >fs permissions sounds like the right thing to do.
    > >
    > > So, it is possible that BLCR needs to be doing something different with 
    > >respect to the permissions when reopenning external pipes.  I will look 
    > >into that and get back to you.  However, I'd appreciate it if you could 
    > >also be looking into figuring out what fd 16 is being used for.  It is 
    > >entirely possible that it is something that will require a different 
    > >approach.
    > >
    > >-Paul
    > >
    > >Ted Cabeen wrote:
    > >>All right.  I've disabled nscd, and we're on to the next problem. 
    > >>(Sorry I missed that note in the FAQ)  When I restart a job with 
    > >>enable-restore-ids, I get the following error:
    > >>- Error -13 from cr_filp_reopen() while restoring external pipe
    > >>- cr_restore_all_files [3766]:  Unable to restore fd 16 (type=4,err=-13)
    > >>- cr_rstrt_child [3766]:  Unable to restore files!  (err=-13)
    > >>Restart failed: Permission denied
    > >>
    > >>My suspended jobs start fine when complied without enable-restore-ids.
    > >>
    > >>Thoughts?
    > >>
    > >>--Ted
    > >>
    > >>Paul H. Hargrove wrote:
    > >>>Ted,
    > >>>
    > >>> You are right about the nscd cache file being opened as root (or 
    > >>>other "system" id).  The program acquires the file descriptor via fd 
    > >>>passing from a privileged daemon process. Since we can't safely 
    > >>>reopen this file as the user and we are equally unable to reproduce 
    > >>>the descriptor passing from the daemon, BLCR is incompatible with 
    > >>>nscd (see FAQ:  http://mantis.lbl.gov/blcr/doc/html/FAQ.html#nscd ).  
    > >>>If you were to perform the restart as the original user you would 
    > >>>encounter this problem regardless of --enable-restore-ids of not.  I 
    > >>>am afraid the only known solution is to disable nscd.
    > >>>
    > >>>-Paul
    > >>>
    > >>>Ted Cabeen wrote:
    > >>>>I'm having problems with 0.8.0 with --enable-restore-ids.  When I 
    > >>>>try to restart a checkpointed job, I get the following error:
    > >>>>- open('/var/cache/nscd/passwd', 0x0) failed: -13
    > >>>>- mmap failed: /var/cache/nscd/passwd
    > >>>>- thaw_threads returned error, aborting. -13
    > >>>>Restart failed: Permission denied
    > >>>>
    > >>>>If I recompile 0.8.0 without restore-ids, it doesn't have this 
    > >>>>error.  I think that the problem may be that the nscd cache is 
    > >>>>opened on behalf of the program by libc as root, but when BLCR tries 
    > >>>>to restart the checkpointed program as the original user, it can't 
    > >>>>open the nscd cache.  Is there a way to fix this?
    > >>>>
    > >>>>--Ted
    > >>>
    > >>>
    > >
    > >
    

  • Next message: Hongjia Cao: "program segfault after restart"