Re: berkeley checkpointing

From: Jerry Mersel (jerry.mersel_at_weizmann.ac.il)
Date: Wed Jan 02 2008 - 04:32:26 PST

  • Next message: Paul H. Hargrove: "Re: berkeley checkpointing"
    I manage to checkpoint matlab processes  from the command line.
    But when I want to use SGE I get the error:
    /lib64/libc.so.6: relocation error: /lib64/tls/libpthread.so.0: symbol 
    errno, version GLIBC_PRIVATE not defined in file libc.so.6 with link 
    time reference
    Restart failed: No such device or address
    
    The relocation error I get on the start using cr_run.
    The Restart failed I get when trying to restart.
    
    I start matlab thus:
    ${BLCR_HOME}/bin/cr_run env LD_PRELOAD=libcr.so.0:libpthread.so.0 matlab 
    -nojvm -nodisplay -nosplash < $H/test.m
    
    and try to restart thus:
    ${BLCR_HOME}/bin/cr_restart $ckptfile
    
    my log file says this:
    Jan  2 14:24:36 kam02 kernel: Skipping a socket.
    Jan  2 14:24:36 kam02 kernel: Skipping a socket.
    Jan  2 14:26:03 kam02 kernel: Failed to open chrdev major=5 minor=0 
    path='/dev/tty')
    Jan  2 14:26:03 kam02 kernel: cr_restore_all_files [28703]:  Unable to 
    restore fd 3 (type=6,err=-6)
    Jan  2 14:26:03 kam02 kernel: cr_rstrt_child [28703]:  Unable to restore 
    files!  (err=-6)
    
    Perhaps something to do with the socket.
    What do you think?
    
                                    Regards,
                                       Jerry
    
    P.S. I have prelinking turned off.
     
    
    cat
    
    Paul H. Hargrove wrote:
    
    > Jerry Mersel wrote:
    >
    >> Hi:
    >>
    >>  I am trying to migrate jobs on a grid after checkpointing.
    >> Does the "prelinking" fix as mentioned in the faq must it be done
    >> on the checkpointed node and the migrated to node?
    >>
    >>                                     Regards,
    >>                                        Jerry
    >
    > Yes, the prelinking of libraries should be disabled on both the 
    > "checkpointed on" and "migrated to" nodes.
    > I will clarify this in the next FAQ version.
    >
    > -Paul
    >
    

  • Next message: Paul H. Hargrove: "Re: berkeley checkpointing"