From: Ladislav Subr (subr-blcr_at_sirrah.troja.mff.cuni.cz)
Date: Thu Jan 13 2005 - 00:41:06 PST
Dear Paul, I'm interested in the 2.6 + Opteron support for BLCR. My aim is to use it as a migration & backup tool on an Opteron cluster. Is it possible to get your current version? I'm just about to start testing my wrappers and it would be helpful if I could do that directly on the target architecture. Best regards Ladislav > Tarun, > I am still working on Linux 2.6 and Opteron support. I had hope to > be done w/ 2.6 by Jan 1, but am running behind. At this point blcr > passes the single threaded tests on an Athlon running SuSE Linux 9.2 (a > 2.6.8 kernel), but gets a kernel Oops on the multi-threaded tests. I > believe that there is an uninitialized pointer or a similar problem in > the kernel module, which is proving difficult to track down. > > I am afraid I don't have a very accurate estimate on session or > process group support at this time. I'd certainly like to see this > support done in time for an April release. > > I am also sorry to tell you that currently there is no way to > checkpoint a process tree with the current BLCR. The problem is that at > restart time there is presently no "resource naming" that would allow > identification of the shared file descriptors (such as the common > connection to stdin and stdout, or the pipes between processes). > > -Paul > > Tarun Agarwal wrote: > > Hi Paul, > > > > I had met you at SC2004. As I had said I am working on integrating > > checkpointing support using BLCR in a batch system here at UIUC. Saving > > sessions seems critical to using BLCR for checkpointing. You had put that > > in ongoing work at that time. I'd appreciate if you could tell me when > > can this support be expected?. Alternatively is there some way of > > checkpointing a process subtree (say a shell script and its forks) in the > > current version? > > > > Thanks > > Tarun > > > > On Wed, 3 Nov 2004, Paul H. Hargrove wrote: > >>I am hoping to have the 2.6 port for ia32 done by Jan 1. I expect that > >> the Opteron-specifc support will be finished at about the same time, or > >> soon after that. The speed with which we can get Opteron support > >> implemented will depend in part on availability of test platforms. > >> > >>-Paul > >> > >>Tarun Agarwal wrote: > >>>Thanks for the quick response. Is there some time frame that you have in > >>>mind for the 2.6 kernel compatible release of BLCR? > >>> > >>>Thanks > >>>Tarun > >>> > >>>On Tue, 2 Nov 2004, Paul H. Hargrove wrote: > >>>>BLCR does not support the Opteron at all at this time. > >>>>Support for Opteron will be for the 2.6 kernel only, and that work is > >>>>still in > >>>>progress. > >>>> > >>>>-Paul > >>>> > >>>>Tarun Agarwal wrote: > >>>>>Hi > >>>>> > >>>>>I am trying to use BLCR on Linux 2.4 running on Opteron machine. Does > >>>>>BLCR > >>>>>work on the AMD Opteron architecture running 2.4 kernel? I got the > >>>>>following error upon running make : > >>>>> > >>>>># make > >>>>>make all-recursive > >>>>>make[1]: Entering directory `/home/kale/testmpi/tarun/blcr-0.2.3' > >>>>>Making all in man > >>>>>make[2]: Entering directory `/home/kale/testmpi/tarun/blcr-0.2.3/man' > >>>>>make[2]: Nothing to be done for `all'. > >>>>>make[2]: Leaving directory `/home/kale/testmpi/tarun/blcr-0.2.3/man' > >>>>>Making all in include > >>>>>make[2]: Entering directory > >>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/include' > >>>>>make[2]: Nothing to be done for `all'. > >>>>>make[2]: Leaving directory > >>>>> `/home/kale/testmpi/tarun/blcr-0.2.3/include' Making all in cr_module > >>>>>make[2]: Entering directory > >>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/cr_module' > >>>>>if gcc -DHAVE_CONFIG_H -I. -I. -I.. -I../include -I../include > >>>>>-I../vmadump > >>>>>-I/usr/src/linux-2.4/include -D__KERNEL__ -DMODULE -Wall > >>>>>-Wstrict-prototypes -O2 -fomit-frame-pointer -g -O2 -MT > >>>>> cr_dump_self.o -MD > >>>>>-MP -MF ".deps/cr_dump_self.Tpo" \ > >>>>> -c -o cr_dump_self.o `test -f 'cr_dump_self.c' || echo > >>>>>'./'`cr_dump_self.c; \ > >>>>>then mv -f ".deps/cr_dump_self.Tpo" ".deps/cr_dump_self.Po"; \ > >>>>>else rm -f ".deps/cr_dump_self.Tpo"; exit 1; \ > >>>>>fi > >>>>>In file included from cr_dump_self.c:35: > >>>>>../vmadump/vmadump.h:84:2: #error VMADUMP does not support this > >>>>>architecture > >>>>>cr_dump_self.c: In function `cr_do_coredump': > >>>>>cr_dump_self.c:70: warning: implicit declaration of function > >>>>>`get_pt_regs' > >>>>>cr_dump_self.c:71: warning: passing arg 2 of pointer to function makes > >>>>>pointer from integer without a cast > >>>>>cr_dump_self.c: In function `cr_do_vmadump': > >>>>>cr_dump_self.c:1103: warning: passing arg 2 of > >>>>> `vmadump_freeze_threads' makes pointer from integer without a cast > >>>>>make[2]: *** [cr_dump_self.o] Error 1 > >>>>>make[2]: Leaving directory > >>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/cr_module' > >>>>>make[1]: *** [all-recursive] Error 1 > >>>>>make[1]: Leaving directory `/home/kale/testmpi/tarun/blcr-0.2.3' > >>>>>make: *** [all] Error 2 > >>>>># > >>>>> > >>>>>Thnaks > >>>>>Tarun Agarwal > >>>>>Graduate Student, CS, UIUC. > >>>> > >>>>-- > >>>>Paul H. Hargrove PHHargrove_at_lbl_dot_gov > >>>>Future Technologies Group > >>>>HPC Research Department Tel: +1-510-495-2352 > >>>>Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > >> > >>-- > >>Paul H. Hargrove PHHargrove_at_lbl_dot_gov > >>Future Technologies Group > >>HPC Research Department Tel: +1-510-495-2352 > >>Lawrence Berkeley National Laboratory Fax: +1-510-486-6900