From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Mon Mar 10 2008 - 11:07:05 PST
Yuan, The most likely cause is that the restart failed to open one of the files that was open() or mmap()ed at the time the checkpoint was taken. Based on the fact that you see this w/ a shell script, but not C code, my best guess is that you are encountering a problem with the file that the Name Service Cache Daemon (NSCD) uses. Please see the following FAQ entry for more detail (including what to look for in the system logs) http://upc-bugs.lbl.gov/blcr/doc/html/FAQ.html#nscd The only known work-around is to remove NSCD from your system. -Paul Yuan Wan wrote: > > Hi all, > > I'm trying to restart my shell script jobs (bash and R) with BLCR but > failed with the following error: > > "Restart failed: Permission denied" > > I can checkpoint the job and get context file. The restart will be > successful if executed by root but fail if run by normal users. The > context file does belongs to me, so I'm wondering where the permission > is required. I can also restart a C code as a regular user without problem. > > Anyone know the possible reason? Thanks > > --Yuan > > Yuan Wan -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900