Re: using blcr on program with fork

From: Andrea Autiero S143785 (andrea.autiero_at_studenti.polito.it)
Date: Wed Mar 11 2009 - 00:01:14 PDT

  • Next message: Weizhongwei: "some problems"
    well..i can't explain why it work, but now it work..
    doing the same steps now it work..
    the blcr is correctly working..the problem was on the application..
    thanks for the support
    andrea
    
    On Tue, 10 Mar 2009 11:41:40 -0700, "Paul H. Hargrove" <PHHargrove_at_lbl_dot_gov>
    wrote:
    > Andrea,
    > 
    >   You are correct that the "restarter" should not need to be linked to 
    > any BLCR libraries if it uses system() to request the checkpoint ant the 
    > restart.  If later you wanted to use the C equivalent of the 
    > cr_checkpoint and cr_restart utilities (for instance to have more 
    > control) you would need to link the "full" libcr.a.
    > 
    >   I cannot be certain what the problem is with your CR_ENOSUPPORT error, 
    > but I do have a couple things you could try.
    > 
    > 1)  The warning about dlopen in statically linked applications is just a 
    > warning, not an error, and BLCR should know what to do when dlopen() 
    > fails.  However, I don't typically test on a system w/o shared libraries 
    > and so if BLCR is getting this wrong that could be one possible reason 
    > for your failure.  Looking quickly at the code, the following one-line 
    > change might fix things, but I am not very confident about that:
    > 
    > --- libcr/cr_libinit.c  14 Feb 2009 02:31:36 -0000      1.14.6.1
    > +++ libcr/cr_libinit.c  10 Mar 2009 18:15:44 -0000
    > @@ -143,7 +143,7 @@
    >      //
    >      if (CR_SIGNUM != __libc_current_sigrtmax()) {
    >         // Signal is already allocated.  Should we keep or replace?
    > -       void *full_handler = NULL;
    > +       void *full_handler = (void*)&cri_init; /* Cannot match */
    >         void *dlhandle = dlopen(NULL, RTLD_LAZY);
    >         if (dlhandle) {
    >             // Note that the preloaded one has been name-shifted
    > 
    > 2)  If the one-line patch above doesn't fix the problem, the I must ask 
    > if you have been able to run the BLCR testsuite successfully on your 
    > embedded platform?  You can find instructions for this in 
    > config/cross_helper.c.  If you get failures running the testuite, we 
    > should focus our attention there rather than on your specific
    application.
    > 
    > -Paul
    > 
    > 
    > Andrea Autiero S143785 wrote:
    >> hello..that's me another time..
    >> now i've the following problem
    >>
    >> andrea@chisone:~/Desktop/materiale_tesi> source
    >> ../programmi_per_tesi/eldk/eldk_init 4xxARCH=ppc
    >> CROSS_COMPILE=ppc_4xx-
    >> DEPMOD=/home/andrea/Desktop/programmi_per_tesi/eldk/usr/bin/depmod.pl
    >>
    PATH=/home/andrea/Desktop/programmi_per_tesi/eldk/usr/bin:/home/andrea/Desktop/programmi_per_tesi/eldk/bin:/home/andrea/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/opt/kde3/bin:/opt/cross/bin:/usr/lib/jvm/jre/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/local/bin:/usr/local/bin
    >> andrea@chisone:~/Desktop/materiale_tesi> ${CROSS_COMPILE}gcc -static -o
    >> ppc_controller controller2.c -Wall
    >> -L/ppc_blcr/builddir/ppc_blcr/builddir/lib/ -lcr_run -u cr_run_link_me
    >> -ldl
    >> -lpthread
    >>
    /ppc_blcr/builddir/ppc_blcr/builddir/lib//libcr_run.a(libcr_run_la-cr_run.o):
    >> In function `cri_init':
    >>
    /home/andrea/Desktop/blcr-0.7.3/builddir/libcr/../../libcr/cr_libinit.c:148:
    >> warning: Using 'dlopen' in statically linked applications requires at
    >> runtime the shared libraries from the glibc version used for linking
    >>
    >> what would be the matter?
    >> I'm trying to create an application which will be checkpointed and
    >> restarted from another application
    >> (via system("cr_checkpoint pid")..)
    >> i think that the "restarter" doesn't need the link with blcr..
    >> the file to be checkpointed doesn't work and give me an error
    >>
    >> blcr: retry request on -CR_ENOSUPPORT
    >> Checkpoint failed: support missing from application
    >>
    >> thanks for any suggestions..
    >> Andrea Autiero
    >>
    >>
    >> On Wed, 25 Feb 2009 12:22:39 -0800, "Paul H. Hargrove"
    >> <PHHargrove_at_lbl_dot_gov>
    >> wrote:
    >>   
    >>> Andrea Autiero S143785 wrote:
    >>>     
    >>>> i'm using shared memory in my program
    >>>> removing every line refering to them let blcr checkpoint my
    >>>> applications..
    >>>> could be this the problem?
    >>>>   
    >>>>       
    >>> Yes, that is almost certainly the problem.  In the dmesg output you
    sent
    >>>
    >>> I found
    >>>     blcr: vfs_read returned -22
    >>>     blcr: write returned -22 on copy-out of mmap()ed data
    >>>     blcr: vfs_read returned -22
    >>>     blcr: write returned -22 on copy-out of mmap()ed data
    >>> which is consistent with use of SysV or POSIX shared memory.
    >>>
    >>> Unfortunately, BLCR does not yet have support for SvsY or POSIX shared 
    >>> memory.  However, if you can change your program to instead use an 
    >>> anonymous mmap() to obtain shared memory, that *is* supported by BLCR.
    >>>
    >>> Additionally, it is possible to construct a program with BLCR callbacks
    
    >>> that would disconnect from the shared memory when a checkpoint request 
    >>> is received, allowing the checkpoint to be taken, and then reconnect 
    >>> afterwards.  However, that opens up the messy issue of adding a 
    >>> mechanism for preserving the shared memory values.
    >>>
    >>> -Paul
    >>>
    >>>
    >>>     
    >>>> On Mon, 23 Feb 2009 13:50:39 -0800, "Paul H. Hargrove"
    >>>> <PHHargrove_at_lbl_dot_gov>
    >>>> wrote:
    >>>>   
    >>>>       
    >>>>> Andrea,
    >>>>>
    >>>>>   I cannot tell from the information you have provided what the
    >>>>>   problem
    >>>>>         
    >>
    >>   
    >>>>> might be.  If I construct a simple example program that behaves as
    you
    >>>>>
    >>>>> describe, and I compile it as you describe, then I am able to
    >>>>>         
    >> checkpoint
    >>   
    >>>>> it and restart it just fine.
    >>>>>   Could you please check the output of the "dmesg" command and/or
    your
    >>>>>
    >>>>> system logs to see if there are any kernel messages that might help 
    >>>>> explain the failure.
    >>>>>
    >>>>> -Paul
    >>>>>
    >>>>> Andrea Autiero S143785 wrote:
    >>>>>     
    >>>>>         
    >>>>>> hi!
    >>>>>> it's me another time..
    >>>>>> after made statically linked file with blcr I've got another
    >>>>>> problem..
    >>>>>> I'm trying to checkpoint a program after it forks twice
    >>>>>> then from another shell (but in the future it will be done by the
    >>>>>>       
    >>>>>>           
    >>>> program
    >>>>   
    >>>>       
    >>>>>> itself)
    >>>>>> i try to checkpoint it and the answer is:
    >>>>>>  >ps -a
    >>>>>>    PID TTY          TIME CMD
    >>>>>>    5878 pts/0    00:00:00 controller
    >>>>>>    5879 pts/0    00:00:02 controller
    >>>>>>    5880 pts/0    00:00:02 controller
    >>>>>>    5881 pts/1    00:00:00 ps
    >>>>>>  >cr_checkpoint 5878
    >>>>>> Checkpoint failed: Invalid argument
    >>>>>>
    >>>>>> 5878 is the father..
    >>>>>> i've compiled it by 
    >>>>>>     >gcc -o controller controller.c -L/usr/local/lib/ -lcr_run -u
    >>>>>> cr_run_link_me -ldl -lpthread
    >>>>>>     >nm controller | grep _link_me
    >>>>>>          U cr_run_link_me
    >>>>>>
    >>>>>> (now is not statically linked because I'm trying on a pc and not on
    >>>>>> an
    >>>>>> embedded system, but is in the last one that it must work)
    >>>>>> why it do this?could you help me to make it works?
    >>>>>> thanks..
    >>>>>> have a good day
    >>>>>> Andrea Autiero
    >>>>>>
    >>>>>>
    >>>>>>
    

  • Next message: Weizhongwei: "some problems"