From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Fri Mar 14 2008 - 16:57:59 PST
Yuan, What do you get if you run the following two commands? $ ls -l /usr/lib64/gconv/gconv-modules.cache $ tcsh -c 'cat /proc/$$/maps' | grep gconv What I see is a world readable file and a shared read-only mmap in tcsh: $ ls -l /usr/lib64/gconv/gconv-modules.cache -rw-r--r-- 1 root root 21514 Jun 3 2005 /usr/lib64/gconv/gconv-modules.cache $ tcsh -c 'cat /proc/$$/maps' | grep gconv 2b8e36967000-2b8e3696d000 r--s 00000000 00:0f 9486631 /usr/lib64/gconv/gconv-modules.cache So, there shouldn't be a problem unless there is something different about your system. -Paul Paul H. Hargrove wrote: > Yuan, > > I've not seen that particular failure before, but some quick research > indicates that gconv-modules.cache is a part of glibc and I suspect that > it is getting mapped in much the same way as the NCSD file is. I will > continue to look into the problem to see what BLCR might be able to do > differently, > > -Paul > > Yuan Wan wrote: > >> Hi Paul, >> >> Thanks for replying. >> The error messege I got from /var/log/messeges is as the following: >> >> vmadump: mmap failed: /usr/lib64/gconv/gconv-modules.cache >> thaw_threads returned error, aborting. -13 >> >> The failure seems not caused by NSCD. What do you think? >> >> --Yuan >> >> >> On Mon, 10 Mar 2008, Paul H. Hargrove wrote: >> >> >>> Yuan, >>> >>> The most likely cause is that the restart failed to open one of the >>> files that was open() or mmap()ed at the time the checkpoint was taken. >>> Based on the fact that you see this w/ a shell script, but not C code, >>> my best guess is that you are encountering a problem with the file that >>> the Name Service Cache Daemon (NSCD) uses. Please see the following FAQ >>> entry for more detail (including what to look for in the system logs) >>> http://upc-bugs.lbl.gov/blcr/doc/html/FAQ.html#nscd >>> The only known work-around is to remove NSCD from your system. >>> >>> -Paul >>> >>> Yuan Wan wrote: >>> >>>> Hi all, >>>> >>>> I'm trying to restart my shell script jobs (bash and R) with BLCR but >>>> failed with the following error: >>>> >>>> "Restart failed: Permission denied" >>>> >>>> I can checkpoint the job and get context file. The restart will be >>>> successful if executed by root but fail if run by normal users. The >>>> context file does belongs to me, so I'm wondering where the permission >>>> is required. I can also restart a C code as a regular user without >>>> problem. >>>> >>>> Anyone know the possible reason? Thanks >>>> >>>> --Yuan >>>> >>>> Yuan Wan >>>> >>> >>> > > > -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900