From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Thu Jan 13 2005 - 09:59:56 PST
Ladislav, As I described to Tarun, the present version is unstable on IA32 and has not been tried at all on Opteron. When things progress a little more, I'd be happy to send you something that is stable on IA32. I'd be very pleased if you could then help by testing on an Opteron, as I don't yet have access to one where I have root access to load the blcr kernel modules. -Paul Ladislav Subr wrote: > Dear Paul, > > I'm interested in the 2.6 + Opteron support for BLCR. My aim is to use it as a > migration & backup tool on an Opteron cluster. Is it possible to get your > current version? I'm just about to start testing my wrappers and it would be > helpful if I could do that directly on the target architecture. > > Best regards > > Ladislav > > >>Tarun, >> I am still working on Linux 2.6 and Opteron support. I had hope to >>be done w/ 2.6 by Jan 1, but am running behind. At this point blcr >>passes the single threaded tests on an Athlon running SuSE Linux 9.2 (a >>2.6.8 kernel), but gets a kernel Oops on the multi-threaded tests. I >>believe that there is an uninitialized pointer or a similar problem in >>the kernel module, which is proving difficult to track down. >> >> I am afraid I don't have a very accurate estimate on session or >>process group support at this time. I'd certainly like to see this >>support done in time for an April release. >> >> I am also sorry to tell you that currently there is no way to >>checkpoint a process tree with the current BLCR. The problem is that at >>restart time there is presently no "resource naming" that would allow >>identification of the shared file descriptors (such as the common >>connection to stdin and stdout, or the pipes between processes). >> >>-Paul >> >>Tarun Agarwal wrote: >> >>>Hi Paul, >>> >>>I had met you at SC2004. As I had said I am working on integrating >>>checkpointing support using BLCR in a batch system here at UIUC. Saving >>>sessions seems critical to using BLCR for checkpointing. You had put that >>>in ongoing work at that time. I'd appreciate if you could tell me when >>>can this support be expected?. Alternatively is there some way of >>>checkpointing a process subtree (say a shell script and its forks) in the >>>current version? >>> >>>Thanks >>>Tarun >>> >>>On Wed, 3 Nov 2004, Paul H. Hargrove wrote: >>> >>>>I am hoping to have the 2.6 port for ia32 done by Jan 1. I expect that >>>>the Opteron-specifc support will be finished at about the same time, or >>>>soon after that. The speed with which we can get Opteron support >>>>implemented will depend in part on availability of test platforms. >>>> >>>>-Paul >>>> >>>>Tarun Agarwal wrote: >>>> >>>>>Thanks for the quick response. Is there some time frame that you have in >>>>>mind for the 2.6 kernel compatible release of BLCR? >>>>> >>>>>Thanks >>>>>Tarun >>>>> >>>>>On Tue, 2 Nov 2004, Paul H. Hargrove wrote: >>>>> >>>>>>BLCR does not support the Opteron at all at this time. >>>>>>Support for Opteron will be for the 2.6 kernel only, and that work is >>>>>>still in >>>>>>progress. >>>>>> >>>>>>-Paul >>>>>> >>>>>>Tarun Agarwal wrote: >>>>>> >>>>>>>Hi >>>>>>> >>>>>>>I am trying to use BLCR on Linux 2.4 running on Opteron machine. Does >>>>>>>BLCR >>>>>>>work on the AMD Opteron architecture running 2.4 kernel? I got the >>>>>>>following error upon running make : >>>>>>> >>>>>>># make >>>>>>>make all-recursive >>>>>>>make[1]: Entering directory `/home/kale/testmpi/tarun/blcr-0.2.3' >>>>>>>Making all in man >>>>>>>make[2]: Entering directory `/home/kale/testmpi/tarun/blcr-0.2.3/man' >>>>>>>make[2]: Nothing to be done for `all'. >>>>>>>make[2]: Leaving directory `/home/kale/testmpi/tarun/blcr-0.2.3/man' >>>>>>>Making all in include >>>>>>>make[2]: Entering directory >>>>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/include' >>>>>>>make[2]: Nothing to be done for `all'. >>>>>>>make[2]: Leaving directory >>>>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/include' Making all in cr_module >>>>>>>make[2]: Entering directory >>>>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/cr_module' >>>>>>>if gcc -DHAVE_CONFIG_H -I. -I. -I.. -I../include -I../include >>>>>>>-I../vmadump >>>>>>>-I/usr/src/linux-2.4/include -D__KERNEL__ -DMODULE -Wall >>>>>>>-Wstrict-prototypes -O2 -fomit-frame-pointer -g -O2 -MT >>>>>>>cr_dump_self.o -MD >>>>>>>-MP -MF ".deps/cr_dump_self.Tpo" \ >>>>>>>-c -o cr_dump_self.o `test -f 'cr_dump_self.c' || echo >>>>>>>'./'`cr_dump_self.c; \ >>>>>>>then mv -f ".deps/cr_dump_self.Tpo" ".deps/cr_dump_self.Po"; \ >>>>>>>else rm -f ".deps/cr_dump_self.Tpo"; exit 1; \ >>>>>>>fi >>>>>>>In file included from cr_dump_self.c:35: >>>>>>>../vmadump/vmadump.h:84:2: #error VMADUMP does not support this >>>>>>>architecture >>>>>>>cr_dump_self.c: In function `cr_do_coredump': >>>>>>>cr_dump_self.c:70: warning: implicit declaration of function >>>>>>>`get_pt_regs' >>>>>>>cr_dump_self.c:71: warning: passing arg 2 of pointer to function makes >>>>>>>pointer from integer without a cast >>>>>>>cr_dump_self.c: In function `cr_do_vmadump': >>>>>>>cr_dump_self.c:1103: warning: passing arg 2 of >>>>>>>`vmadump_freeze_threads' makes pointer from integer without a cast >>>>>>>make[2]: *** [cr_dump_self.o] Error 1 >>>>>>>make[2]: Leaving directory >>>>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/cr_module' >>>>>>>make[1]: *** [all-recursive] Error 1 >>>>>>>make[1]: Leaving directory `/home/kale/testmpi/tarun/blcr-0.2.3' >>>>>>>make: *** [all] Error 2 >>>>>>># >>>>>>> >>>>>>>Thnaks >>>>>>>Tarun Agarwal >>>>>>>Graduate Student, CS, UIUC. >>>>>> >>>>>>-- >>>>>>Paul H. Hargrove PHHargrove_at_lbl_dot_gov >>>>>>Future Technologies Group >>>>>>HPC Research Department Tel: +1-510-495-2352 >>>>>>Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >>>> >>>>-- >>>>Paul H. Hargrove PHHargrove_at_lbl_dot_gov >>>>Future Technologies Group >>>>HPC Research Department Tel: +1-510-495-2352 >>>>Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900