From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Jan 11 2005 - 10:53:34 PST
Tarun, I am still working on Linux 2.6 and Opteron support. I had hope to be done w/ 2.6 by Jan 1, but am running behind. At this point blcr passes the single threaded tests on an Athlon running SuSE Linux 9.2 (a 2.6.8 kernel), but gets a kernel Oops on the multi-threaded tests. I believe that there is an uninitialized pointer or a similar problem in the kernel module, which is proving difficult to track down. I am afraid I don't have a very accurate estimate on session or process group support at this time. I'd certainly like to see this support done in time for an April release. I am also sorry to tell you that currently there is no way to checkpoint a process tree with the current BLCR. The problem is that at restart time there is presently no "resource naming" that would allow identification of the shared file descriptors (such as the common connection to stdin and stdout, or the pipes between processes). -Paul Tarun Agarwal wrote: > Hi Paul, > > I had met you at SC2004. As I had said I am working on integrating > checkpointing support using BLCR in a batch system here at UIUC. Saving > sessions seems critical to using BLCR for checkpointing. You had put that > in ongoing work at that time. I'd appreciate if you could tell me when can > this support be expected?. Alternatively is there some way of > checkpointing a process subtree (say a shell script and its forks) in the > current version? > > Thanks > Tarun > > On Wed, 3 Nov 2004, Paul H. Hargrove wrote: > > >>I am hoping to have the 2.6 port for ia32 done by Jan 1. I expect that the >>Opteron-specifc support will be finished at about the same time, or soon after >>that. The speed with which we can get Opteron support implemented will depend >>in part on availability of test platforms. >> >>-Paul >> >>Tarun Agarwal wrote: >> >>>Thanks for the quick response. Is there some time frame that you have in >>>mind for the 2.6 kernel compatible release of BLCR? >>> >>>Thanks >>>Tarun >>> >>> >>> >>>On Tue, 2 Nov 2004, Paul H. Hargrove wrote: >>> >>> >>> >>>>BLCR does not support the Opteron at all at this time. >>>>Support for Opteron will be for the 2.6 kernel only, and that work is >>>>still in >>>>progress. >>>> >>>>-Paul >>>> >>>>Tarun Agarwal wrote: >>>> >>>> >>>>>Hi >>>>> >>>>>I am trying to use BLCR on Linux 2.4 running on Opteron machine. Does >>>>>BLCR >>>>>work on the AMD Opteron architecture running 2.4 kernel? I got the >>>>>following error upon running make : >>>>> >>>>># make >>>>>make all-recursive >>>>>make[1]: Entering directory `/home/kale/testmpi/tarun/blcr-0.2.3' >>>>>Making all in man >>>>>make[2]: Entering directory `/home/kale/testmpi/tarun/blcr-0.2.3/man' >>>>>make[2]: Nothing to be done for `all'. >>>>>make[2]: Leaving directory `/home/kale/testmpi/tarun/blcr-0.2.3/man' >>>>>Making all in include >>>>>make[2]: Entering directory >>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/include' >>>>>make[2]: Nothing to be done for `all'. >>>>>make[2]: Leaving directory `/home/kale/testmpi/tarun/blcr-0.2.3/include' >>>>>Making all in cr_module >>>>>make[2]: Entering directory >>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/cr_module' >>>>>if gcc -DHAVE_CONFIG_H -I. -I. -I.. -I../include -I../include >>>>>-I../vmadump >>>>>-I/usr/src/linux-2.4/include -D__KERNEL__ -DMODULE -Wall >>>>>-Wstrict-prototypes -O2 -fomit-frame-pointer -g -O2 -MT cr_dump_self.o >>>>>-MD >>>>>-MP -MF ".deps/cr_dump_self.Tpo" \ >>>>> -c -o cr_dump_self.o `test -f 'cr_dump_self.c' || echo >>>>>'./'`cr_dump_self.c; \ >>>>>then mv -f ".deps/cr_dump_self.Tpo" ".deps/cr_dump_self.Po"; \ >>>>>else rm -f ".deps/cr_dump_self.Tpo"; exit 1; \ >>>>>fi >>>>>In file included from cr_dump_self.c:35: >>>>>../vmadump/vmadump.h:84:2: #error VMADUMP does not support this >>>>>architecture >>>>>cr_dump_self.c: In function `cr_do_coredump': >>>>>cr_dump_self.c:70: warning: implicit declaration of function >>>>>`get_pt_regs' >>>>>cr_dump_self.c:71: warning: passing arg 2 of pointer to function makes >>>>>pointer from integer without a cast >>>>>cr_dump_self.c: In function `cr_do_vmadump': >>>>>cr_dump_self.c:1103: warning: passing arg 2 of `vmadump_freeze_threads' >>>>>makes pointer from integer without a cast >>>>>make[2]: *** [cr_dump_self.o] Error 1 >>>>>make[2]: Leaving directory >>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/cr_module' >>>>>make[1]: *** [all-recursive] Error 1 >>>>>make[1]: Leaving directory `/home/kale/testmpi/tarun/blcr-0.2.3' >>>>>make: *** [all] Error 2 >>>>># >>>>> >>>>>Thnaks >>>>>Tarun Agarwal >>>>>Graduate Student, CS, UIUC. >>>> >>>> >>>>-- >>>>Paul H. Hargrove PHHargrove_at_lbl_dot_gov >>>>Future Technologies Group >>>>HPC Research Department Tel: +1-510-495-2352 >>>>Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >>>> >>>> >> >> >>-- >>Paul H. Hargrove PHHargrove_at_lbl_dot_gov >>Future Technologies Group >>HPC Research Department Tel: +1-510-495-2352 >>Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> >> -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900