From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Thu Aug 07 2008 - 12:50:58 PDT
Manish, There is no stated/known minimum memory requirement for BLCR, but it is still possible that we are too aggressive with memory. I run an emulated ARM environment in QEMU and have not yet tried running with so little memory (though I plan to try today). The default level of tracing detail didn't produce much output for your case because the failure appears to come relatively early. By requesting more detailed tracing, we should be able to narrow down when in BLCR we've failed to allocate memory. Please reload the kernel modules with "make insmod cr_ktrace_mask=0xffffffff", which will enable the most detailed tracing. Then rerun your failed checkpoint and, again, send the output. Hopefully this time there will be enough for me to move forward on diagnosing your problem. Thanks for your patience, Paul Manish Dwivedi wrote: > Hi Paul, > > Thanks for the information. We tried compiling it with the > enable-debug option today. But we didn't get much information in the > log (log file is attached in the e-mail. > > In between, we have 64 MB RAM in the system, is there a limitation or > minimum requirement of the RAM in BLCR ? > > Regards, > Manish > > Ps: We followed the exactly same process for X86 and it is working > fine for us. > > > On Wed, Aug 6, 2008 at 10:58 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov > <mailto:PHHargrove_at_lbl_dot_gov>> wrote: > > Manish, > > I am sorry to hear that you are having problems. From the > information you provide below, it is hard to say what the problem > is, other than to guess that your ARM system is low on memory. > I am aware of a kernel-side memory leak in blcr-0.7.2, which > should be fixed in the 0.7.3 release expected later this week or > early next week. So, I'd like to know if the failure you describe > happens on the very first use of cr_checkpoint, or does it happen > after BLCR has been used several times (for instance by running > "make check")? If it works for a while and then begins to fail, > I'd suspect the known memory leak and suggest that you wait for > blcr-0.7.3. > If you are seeing failure on the very first attempt to use blcr, > then I suggest that you rebuild blcr with debugging enabled and > send me the information dumped to the system logs (run dmesg or > see /var/log/messages to find the logs). To do this, you'll need > to start at the beginning of the configure/make/install process > and pass the "--enable-debug" option to configure, and then > proceed with the rest of the build/install process. Be sure to > "make insmod" (or manually rmmod the old modules and > insmod/modprobe the new ones); otherwise the kernel modules from > your previous (non-debug) build may still be running. With the > new kernel modules loaded, you should retry your failing command > and then look for messages with "blcr: " in them in the system logs. > > I also should tell you that there is an ARM-specific mailing list > (very low volume) for BLCR that may help you reach other ARM > users. You can find list info and subscribe (required to post) at > https://hpcrdm.lbl.gov/mailman/listinfo/blcr-arm > > -Paul > > > Manish Dwivedi wrote: > > Hi All, > > I am trying to use BLCR for ARM. But when I am trying to use > cr_checkpoint with a hello.c program it is giving me an error > as below: > > cr_checkpoint --term <pid> (command run) > Checkpoint failed: Cannot allocate memory > > I have compiled hello.c in the same kernel as mentioned in the > release notes, I am using blcr-0.7.2.tar.gz for this. > > Could anyone help me out resolving this issue so that I can > test it. It works fine for me on a X86 machine. > > Regards, > Manish > > > > -- > Paul H. Hargrove PHHargrove_at_lbl_dot_gov > <mailto:PHHargrove_at_lbl_dot_gov> > Future Technologies Group HPC Research Department > Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900