Re: Questions on BLCR..

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Thu Jan 13 2005 - 09:59:56 PST

  • Next message: Ladislav Subr: "Re: Questions on BLCR.."
    Ladislav,
    
    As I described to Tarun, the present version is unstable on IA32 and has 
    not been tried at all on Opteron.  When things progress a little more, 
    I'd be happy to send you something that is stable on IA32.  I'd be very 
    pleased if you could then help by testing on an Opteron, as I don't yet 
    have access to one where I have root access to load the blcr kernel modules.
    
    -Paul
    
    Ladislav Subr wrote:
    
    > Dear Paul,
    > 
    > I'm interested in the 2.6 + Opteron support for BLCR. My aim is to use it as a 
    > migration & backup tool on an Opteron cluster. Is it possible to get your 
    > current version? I'm just about to start testing my wrappers and it would be 
    > helpful if I could do that directly on the target architecture.
    > 
    > Best regards
    > 
    > 	Ladislav
    > 
    > 
    >>Tarun,
    >>   I am still working on Linux 2.6 and Opteron support.  I had hope to
    >>be done w/ 2.6 by Jan 1, but am running behind.  At this point blcr
    >>passes the single threaded tests on an Athlon running SuSE Linux 9.2 (a
    >>2.6.8 kernel), but gets a kernel Oops on the multi-threaded tests.  I
    >>believe that there is an uninitialized pointer or a similar problem in
    >>the kernel module, which is proving difficult to track down.
    >>
    >>   I am afraid I don't have a very accurate estimate on session or
    >>process group support at this time.  I'd certainly like to see this
    >>support done in time for an April release.
    >>
    >>   I am also sorry to tell you that currently there is no way to
    >>checkpoint a process tree with the current BLCR.  The problem is that at
    >>restart time there is presently no "resource naming" that would allow
    >>identification of the shared file descriptors (such as the common
    >>connection to stdin and stdout, or the pipes between processes).
    >>
    >>-Paul
    >>
    >>Tarun Agarwal wrote:
    >>
    >>>Hi Paul,
    >>>
    >>>I had met you at SC2004. As I had said I am working on integrating
    >>>checkpointing support using BLCR in a batch system here at UIUC. Saving
    >>>sessions seems critical to using BLCR for checkpointing. You had put that
    >>>in ongoing work at that time. I'd appreciate if you could tell me when
    >>>can this support be expected?. Alternatively is there some way of
    >>>checkpointing a process subtree (say a shell script and its forks) in the
    >>>current version?
    >>>
    >>>Thanks
    >>>Tarun
    >>>
    >>>On Wed, 3 Nov 2004, Paul H. Hargrove wrote:
    >>>
    >>>>I am hoping to have the 2.6 port for ia32 done by Jan 1.  I expect that
    >>>>the Opteron-specifc support will be finished at about the same time, or
    >>>>soon after that.  The speed with which we can get Opteron support
    >>>>implemented will depend in part on availability of test platforms.
    >>>>
    >>>>-Paul
    >>>>
    >>>>Tarun Agarwal wrote:
    >>>>
    >>>>>Thanks for the quick response. Is there some time frame that you have in
    >>>>>mind for the 2.6 kernel compatible release of BLCR?
    >>>>>
    >>>>>Thanks
    >>>>>Tarun
    >>>>>
    >>>>>On Tue, 2 Nov 2004, Paul H. Hargrove wrote:
    >>>>>
    >>>>>>BLCR does not support the Opteron at all at this time.
    >>>>>>Support for Opteron will be for the 2.6 kernel only, and that work is
    >>>>>>still in
    >>>>>>progress.
    >>>>>>
    >>>>>>-Paul
    >>>>>>
    >>>>>>Tarun Agarwal wrote:
    >>>>>>
    >>>>>>>Hi
    >>>>>>>
    >>>>>>>I am trying to use BLCR on Linux 2.4 running on Opteron machine. Does
    >>>>>>>BLCR
    >>>>>>>work on the AMD Opteron architecture running 2.4 kernel? I got the
    >>>>>>>following error upon running make :
    >>>>>>>
    >>>>>>># make
    >>>>>>>make  all-recursive
    >>>>>>>make[1]: Entering directory `/home/kale/testmpi/tarun/blcr-0.2.3'
    >>>>>>>Making all in man
    >>>>>>>make[2]: Entering directory `/home/kale/testmpi/tarun/blcr-0.2.3/man'
    >>>>>>>make[2]: Nothing to be done for `all'.
    >>>>>>>make[2]: Leaving directory `/home/kale/testmpi/tarun/blcr-0.2.3/man'
    >>>>>>>Making all in include
    >>>>>>>make[2]: Entering directory
    >>>>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/include'
    >>>>>>>make[2]: Nothing to be done for `all'.
    >>>>>>>make[2]: Leaving directory
    >>>>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/include' Making all in cr_module
    >>>>>>>make[2]: Entering directory
    >>>>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/cr_module'
    >>>>>>>if gcc -DHAVE_CONFIG_H -I. -I. -I.. -I../include -I../include
    >>>>>>>-I../vmadump
    >>>>>>>-I/usr/src/linux-2.4/include -D__KERNEL__ -DMODULE   -Wall
    >>>>>>>-Wstrict-prototypes -O2 -fomit-frame-pointer  -g -O2 -MT
    >>>>>>>cr_dump_self.o -MD
    >>>>>>>-MP -MF ".deps/cr_dump_self.Tpo" \
    >>>>>>>-c -o cr_dump_self.o `test -f 'cr_dump_self.c' || echo
    >>>>>>>'./'`cr_dump_self.c; \
    >>>>>>>then mv -f ".deps/cr_dump_self.Tpo" ".deps/cr_dump_self.Po"; \
    >>>>>>>else rm -f ".deps/cr_dump_self.Tpo"; exit 1; \
    >>>>>>>fi
    >>>>>>>In file included from cr_dump_self.c:35:
    >>>>>>>../vmadump/vmadump.h:84:2: #error VMADUMP does not support this
    >>>>>>>architecture
    >>>>>>>cr_dump_self.c: In function `cr_do_coredump':
    >>>>>>>cr_dump_self.c:70: warning: implicit declaration of function
    >>>>>>>`get_pt_regs'
    >>>>>>>cr_dump_self.c:71: warning: passing arg 2 of pointer to function makes
    >>>>>>>pointer from integer without a cast
    >>>>>>>cr_dump_self.c: In function `cr_do_vmadump':
    >>>>>>>cr_dump_self.c:1103: warning: passing arg 2 of
    >>>>>>>`vmadump_freeze_threads' makes pointer from integer without a cast
    >>>>>>>make[2]: *** [cr_dump_self.o] Error 1
    >>>>>>>make[2]: Leaving directory
    >>>>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/cr_module'
    >>>>>>>make[1]: *** [all-recursive] Error 1
    >>>>>>>make[1]: Leaving directory `/home/kale/testmpi/tarun/blcr-0.2.3'
    >>>>>>>make: *** [all] Error 2
    >>>>>>>#
    >>>>>>>
    >>>>>>>Thnaks
    >>>>>>>Tarun Agarwal
    >>>>>>>Graduate Student, CS, UIUC.
    >>>>>>
    >>>>>>--
    >>>>>>Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    >>>>>>Future Technologies Group
    >>>>>>HPC Research Department                   Tel: +1-510-495-2352
    >>>>>>Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    >>>>
    >>>>--
    >>>>Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    >>>>Future Technologies Group
    >>>>HPC Research Department                   Tel: +1-510-495-2352
    >>>>Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    

  • Next message: Ladislav Subr: "Re: Questions on BLCR.."