Re: Restart Failed: permission denied

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Mar 28 2007 - 16:11:54 PST

  • Next message: Tom Spyrou: "RE: Restart Failed: permission denied"
      Thanks for your interest in BLCR.
      The first thing I note is that if you are talking about the normal
    xclock found as /usr/X11R6/bin/xclock or /usr/bin/X11/xclock on most
    systems, then it *does* open a socket used to talk to the X server.
    That alone is enough reason for the restart to fail.
      However, the error you are seeing is not related to that socket.
    Rather you are encountering the evil of the Name Service Cache Daemon
    (nscd).  The issue with nscd is that it passes a file descriptor (man
    sendmsg and/or recvmsg) from the daemon (which has permissions to open()
    the file) to a client (which cannot open() the file itself).  The client
    can/will then mmap() the file.  When BLCR is trying to reestablish the
    mmap()s that a process had before the checkpoint, it fails because the
    nscd-passed file descriptor can't be recreated w/o help from nscd - help
    which it cannot provide during the restart.
      Since nscd is mostly only useful to programs that do name service
    lookups, which in turn usually implies sockets, it doesn't often create
    problems for BLCR.  However, since it can also handle/cache the getpw*()
    family of functions for glibc, one can see problems even when not using
    sockets.  You might consider removing nscd from your system to  help
    avoid some future BLCR problems.
      If you "make examples" in your BLCR build directory, you'll get some
    very simple/silly programs you can try checkpointing and restarting.
    Tom Spyrou wrote:
    > Hi,
    > I am a new user and have installed and successfully created a checkpoint
    > of an xclock application run as a trial.
    > When I try to restart the application, I get the error in the subject
    > and when I type dmesg I see the following errors.
    > I don't think xclock opens a socket or uses shared memory.
    > I was wondering if anyone had an idea or a better sample application
    > that would be supported.
    > Thanks,
    > Tom
    > Skipping a socket.
    > vmadump: mmap failed: /var/db/nscd/hosts
    > thaw_threads returned error, aborting. -13
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

  • Next message: Tom Spyrou: "RE: Restart Failed: permission denied"