From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Feb 18 2009 - 13:07:06 PST
Ted, You are right about the nscd cache file being opened as root (or other "system" id). The program acquires the file descriptor via fd passing from a privileged daemon process. Since we can't safely reopen this file as the user and we are equally unable to reproduce the descriptor passing from the daemon, BLCR is incompatible with nscd (see FAQ: http://mantis.lbl.gov/blcr/doc/html/FAQ.html#nscd ). If you were to perform the restart as the original user you would encounter this problem regardless of --enable-restore-ids of not. I am afraid the only known solution is to disable nscd. -Paul Ted Cabeen wrote: > I'm having problems with 0.8.0 with --enable-restore-ids. When I try > to restart a checkpointed job, I get the following error: > - open('/var/cache/nscd/passwd', 0x0) failed: -13 > - mmap failed: /var/cache/nscd/passwd > - thaw_threads returned error, aborting. -13 > Restart failed: Permission denied > > If I recompile 0.8.0 without restore-ids, it doesn't have this error. > I think that the problem may be that the nscd cache is opened on > behalf of the program by libc as root, but when BLCR tries to restart > the checkpointed program as the original user, it can't open the nscd > cache. Is there a way to fix this? > > --Ted -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group Tel: +1-510-495-2352 HPC Research Department Fax: +1-510-486-6900 Lawrence Berkeley National Laboratory