From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Jan 02 2008 - 13:13:50 PST
Jerry, Let's try to deal with one problem at a time. First I'd like to address the "relocation error" and see if resolving it still leaves the second error. The purpose of cr_run is to set LD_PRELOAD just as you have done manually. If you could, please tell me if the following two commands (executed via SGE) each produce the same relocation error: ${BLCR_HOME}/bin/cr_run matlab -nojvm -nodisplay -nosplash < $H/test.m env LD_PRELOAD=libcr.so.0:libpthread.so.0 matlab -nojvm -nodisplay -nosplash < $H/test.m If you could, also send the output of "env LD_PRELOAD=libpthread.so.0 ldd /bin/cat" executed both from the command line and via SGE. -Paul Jerry Mersel wrote: > I manage to checkpoint matlab processes from the command line. > But when I want to use SGE I get the error: > /lib64/libc.so.6: relocation error: /lib64/tls/libpthread.so.0: symbol > errno, version GLIBC_PRIVATE not defined in file libc.so.6 with link > time reference > Restart failed: No such device or address > > The relocation error I get on the start using cr_run. > The Restart failed I get when trying to restart. > > I start matlab thus: > ${BLCR_HOME}/bin/cr_run env LD_PRELOAD=libcr.so.0:libpthread.so.0 > matlab -nojvm -nodisplay -nosplash < $H/test.m > > and try to restart thus: > ${BLCR_HOME}/bin/cr_restart $ckptfile > > my log file says this: > Jan 2 14:24:36 kam02 kernel: Skipping a socket. > Jan 2 14:24:36 kam02 kernel: Skipping a socket. > Jan 2 14:26:03 kam02 kernel: Failed to open chrdev major=5 minor=0 > path='/dev/tty') > Jan 2 14:26:03 kam02 kernel: cr_restore_all_files [28703]: Unable to > restore fd 3 (type=6,err=-6) > Jan 2 14:26:03 kam02 kernel: cr_rstrt_child [28703]: Unable to > restore files! (err=-6) > > Perhaps something to do with the socket. > What do you think? > > Regards, > Jerry > > P.S. I have prelinking turned off. > > > cat > > Paul H. Hargrove wrote: > >> Jerry Mersel wrote: >> >>> Hi: >>> >>> I am trying to migrate jobs on a grid after checkpointing. >>> Does the "prelinking" fix as mentioned in the faq must it be done >>> on the checkpointed node and the migrated to node? >>> >>> Regards, >>> Jerry >> >> Yes, the prelinking of libraries should be disabled on both the >> "checkpointed on" and "migrated to" nodes. >> I will clarify this in the next FAQ version. >> >> -Paul >> > -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900