From: Jerry Mersel (jerry.mersel_at_weizmann.ac.il)
Date: Wed Jan 02 2008 - 23:50:45 PST
Hi Paul: Both of those commands do create the same relocation error. (I ran one without cr_run, correct) The results from SGE and without are the same. The results: libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000002a9566c000 libc.so.6 => /lib64/tls/libc.so.6 (0x0000002a95782000) /lib64/ld-linux-x86-64.so.2 (0x0000002a95556000) Thanks, Jerry Paul H. Hargrove wrote: > Jerry, > > Let's try to deal with one problem at a time. First I'd like to > address the "relocation error" and see if resolving it still leaves > the second error. > The purpose of cr_run is to set LD_PRELOAD just as you have done > manually. If you could, please tell me if the following two commands > (executed via SGE) each produce the same relocation error: > > ${BLCR_HOME}/bin/cr_run matlab -nojvm -nodisplay -nosplash < $H/test.m > env LD_PRELOAD=libcr.so.0:libpthread.so.0 matlab -nojvm -nodisplay > -nosplash < $H/test.m > > If you could, also send the output of "env LD_PRELOAD=libpthread.so.0 > ldd /bin/cat" executed both from the command line and via SGE. > > -Paul > > Jerry Mersel wrote: > >> I manage to checkpoint matlab processes from the command line. >> But when I want to use SGE I get the error: >> /lib64/libc.so.6: relocation error: /lib64/tls/libpthread.so.0: >> symbol errno, version GLIBC_PRIVATE not defined in file libc.so.6 >> with link time reference >> Restart failed: No such device or address >> >> The relocation error I get on the start using cr_run. >> The Restart failed I get when trying to restart. >> >> I start matlab thus: >> ${BLCR_HOME}/bin/cr_run env LD_PRELOAD=libcr.so.0:libpthread.so.0 >> matlab -nojvm -nodisplay -nosplash < $H/test.m >> >> and try to restart thus: >> ${BLCR_HOME}/bin/cr_restart $ckptfile >> >> my log file says this: >> Jan 2 14:24:36 kam02 kernel: Skipping a socket. >> Jan 2 14:24:36 kam02 kernel: Skipping a socket. >> Jan 2 14:26:03 kam02 kernel: Failed to open chrdev major=5 minor=0 >> path='/dev/tty') >> Jan 2 14:26:03 kam02 kernel: cr_restore_all_files [28703]: Unable >> to restore fd 3 (type=6,err=-6) >> Jan 2 14:26:03 kam02 kernel: cr_rstrt_child [28703]: Unable to >> restore files! (err=-6) >> >> Perhaps something to do with the socket. >> What do you think? >> >> Regards, >> Jerry >> >> P.S. I have prelinking turned off. >> >> >> cat >> >> Paul H. Hargrove wrote: >> >>> Jerry Mersel wrote: >>> >>>> Hi: >>>> >>>> I am trying to migrate jobs on a grid after checkpointing. >>>> Does the "prelinking" fix as mentioned in the faq must it be done >>>> on the checkpointed node and the migrated to node? >>>> >>>> Regards, >>>> Jerry >>> >>> >>> Yes, the prelinking of libraries should be disabled on both the >>> "checkpointed on" and "migrated to" nodes. >>> I will clarify this in the next FAQ version. >>> >>> -Paul >>> >> > >