Re: using blcr on program with fork

From: Andrea Autiero S143785 (andrea.autiero_at_studenti.polito.it)
Date: Mon Mar 09 2009 - 03:18:54 PST

  • Next message: Wang Yi: "Some general questions about checkpoint/restart"
    hello..that's me another time..
    now i've the following problem
    
    andrea@chisone:~/Desktop/materiale_tesi> source
    ../programmi_per_tesi/eldk/eldk_init 4xxARCH=ppc
    CROSS_COMPILE=ppc_4xx-
    DEPMOD=/home/andrea/Desktop/programmi_per_tesi/eldk/usr/bin/depmod.pl
    PATH=/home/andrea/Desktop/programmi_per_tesi/eldk/usr/bin:/home/andrea/Desktop/programmi_per_tesi/eldk/bin:/home/andrea/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/opt/kde3/bin:/opt/cross/bin:/usr/lib/jvm/jre/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/local/bin:/usr/local/bin
    andrea@chisone:~/Desktop/materiale_tesi> ${CROSS_COMPILE}gcc -static -o
    ppc_controller controller2.c -Wall
    -L/ppc_blcr/builddir/ppc_blcr/builddir/lib/ -lcr_run -u cr_run_link_me -ldl
    -lpthread
    /ppc_blcr/builddir/ppc_blcr/builddir/lib//libcr_run.a(libcr_run_la-cr_run.o):
    In function `cri_init':
    /home/andrea/Desktop/blcr-0.7.3/builddir/libcr/../../libcr/cr_libinit.c:148:
    warning: Using 'dlopen' in statically linked applications requires at
    runtime the shared libraries from the glibc version used for linking
    
    what would be the matter?
    I'm trying to create an application which will be checkpointed and
    restarted from another application
    (via system("cr_checkpoint pid")..)
    i think that the "restarter" doesn't need the link with blcr..
    the file to be checkpointed doesn't work and give me an error
    
    blcr: retry request on -CR_ENOSUPPORT
    Checkpoint failed: support missing from application
    
    thanks for any suggestions..
    Andrea Autiero
    
    
    On Wed, 25 Feb 2009 12:22:39 -0800, "Paul H. Hargrove" <PHHargrove_at_lbl_dot_gov>
    wrote:
    > Andrea Autiero S143785 wrote:
    >> i'm using shared memory in my program
    >> removing every line refering to them let blcr checkpoint my
    >> applications..
    >> could be this the problem?
    >>   
    > Yes, that is almost certainly the problem.  In the dmesg output you sent 
    > I found
    >     blcr: vfs_read returned -22
    >     blcr: write returned -22 on copy-out of mmap()ed data
    >     blcr: vfs_read returned -22
    >     blcr: write returned -22 on copy-out of mmap()ed data
    > which is consistent with use of SysV or POSIX shared memory.
    > 
    > Unfortunately, BLCR does not yet have support for SvsY or POSIX shared 
    > memory.  However, if you can change your program to instead use an 
    > anonymous mmap() to obtain shared memory, that *is* supported by BLCR.
    > 
    > Additionally, it is possible to construct a program with BLCR callbacks 
    > that would disconnect from the shared memory when a checkpoint request 
    > is received, allowing the checkpoint to be taken, and then reconnect 
    > afterwards.  However, that opens up the messy issue of adding a 
    > mechanism for preserving the shared memory values.
    > 
    > -Paul
    > 
    > 
    >> On Mon, 23 Feb 2009 13:50:39 -0800, "Paul H. Hargrove"
    >> <PHHargrove_at_lbl_dot_gov>
    >> wrote:
    >>   
    >>> Andrea,
    >>>
    >>>   I cannot tell from the information you have provided what the problem
    
    >>> might be.  If I construct a simple example program that behaves as you 
    >>> describe, and I compile it as you describe, then I am able to
    checkpoint
    >>>
    >>> it and restart it just fine.
    >>>   Could you please check the output of the "dmesg" command and/or your 
    >>> system logs to see if there are any kernel messages that might help 
    >>> explain the failure.
    >>>
    >>> -Paul
    >>>
    >>> Andrea Autiero S143785 wrote:
    >>>     
    >>>> hi!
    >>>> it's me another time..
    >>>> after made statically linked file with blcr I've got another problem..
    >>>> I'm trying to checkpoint a program after it forks twice
    >>>> then from another shell (but in the future it will be done by the
    >>>>       
    >> program
    >>   
    >>>> itself)
    >>>> i try to checkpoint it and the answer is:
    >>>>  >ps -a
    >>>>    PID TTY          TIME CMD
    >>>>    5878 pts/0    00:00:00 controller
    >>>>    5879 pts/0    00:00:02 controller
    >>>>    5880 pts/0    00:00:02 controller
    >>>>    5881 pts/1    00:00:00 ps
    >>>>  >cr_checkpoint 5878
    >>>> Checkpoint failed: Invalid argument
    >>>>
    >>>> 5878 is the father..
    >>>> i've compiled it by 
    >>>>     >gcc -o controller controller.c -L/usr/local/lib/ -lcr_run -u
    >>>> cr_run_link_me -ldl -lpthread
    >>>>     >nm controller | grep _link_me
    >>>>          U cr_run_link_me
    >>>>
    >>>> (now is not statically linked because I'm trying on a pc and not on an
    >>>> embedded system, but is in the last one that it must work)
    >>>> why it do this?could you help me to make it works?
    >>>> thanks..
    >>>> have a good day
    >>>> Andrea Autiero
    >>>>
    >>>>
    

  • Next message: Wang Yi: "Some general questions about checkpoint/restart"