Re: Error while using cr_checkpoint on ARM

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Thu Aug 07 2008 - 12:50:58 PDT

  • Next message: Manish Dwivedi: "Re: Error while using cr_checkpoint on ARM"
    Manish,
    
      There is no stated/known minimum memory requirement for BLCR, but it 
    is still possible that we are too aggressive with memory.  I run an 
    emulated ARM environment in QEMU and have not yet tried running with so 
    little memory (though I plan to try today).
      The default level of tracing detail didn't produce much output for 
    your case because the failure appears to come relatively early.  By 
    requesting more detailed tracing, we should be able to narrow down when 
    in BLCR we've failed to allocate memory.
      Please reload the kernel modules with "make insmod 
    cr_ktrace_mask=0xffffffff", which will enable the most detailed 
    tracing.  Then rerun your failed checkpoint and, again, send the 
    output.  Hopefully this time there will be enough for me to move forward 
    on diagnosing your problem.
    
    Thanks for your patience,
    Paul
    
    Manish Dwivedi wrote:
    > Hi Paul,
    >
    > Thanks for the information. We tried compiling it with the 
    > enable-debug option today. But we didn't get much information in the 
    > log (log file is attached in the e-mail.
    >
    > In between, we have 64 MB RAM in the system, is there a limitation or 
    > minimum requirement of the RAM in BLCR ?
    >
    > Regards,
    > Manish
    >
    > Ps: We followed the exactly same process for X86 and it is working 
    > fine for us.
    >
    >
    > On Wed, Aug 6, 2008 at 10:58 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov 
    > <mailto:PHHargrove_at_lbl_dot_gov>> wrote:
    >
    >     Manish,
    >
    >      I am sorry to hear that you are having problems.  From the
    >     information you provide below, it is hard to say what the problem
    >     is, other than to guess that your ARM system is low on memory.
    >      I am aware of a kernel-side memory leak in blcr-0.7.2, which
    >     should be fixed in the 0.7.3 release expected later this week or
    >     early next week.  So, I'd like to know if the failure you describe
    >     happens on the very first use of cr_checkpoint, or does it happen
    >     after BLCR has been used several times (for instance by running
    >     "make check")?  If it works for a while and then begins to fail,
    >     I'd suspect the known memory leak and suggest that you wait for
    >     blcr-0.7.3.
    >      If you are seeing failure on the very first attempt to use blcr,
    >     then I suggest that you rebuild blcr with debugging enabled and
    >     send me the information dumped to the system logs (run dmesg or
    >     see /var/log/messages to find the logs).  To do this, you'll need
    >     to start at the beginning of the configure/make/install process
    >     and pass the "--enable-debug" option to configure, and then
    >     proceed with the rest of the build/install process.  Be sure to
    >     "make insmod" (or manually rmmod the old modules and
    >     insmod/modprobe the new ones); otherwise the kernel modules from
    >     your previous (non-debug) build may still be running.  With the
    >     new kernel modules loaded, you should retry your failing command
    >     and then look for messages with "blcr: " in them in the system logs.
    >
    >      I also should tell you that there is an ARM-specific mailing list
    >     (very low volume) for BLCR that may help you reach other ARM
    >     users.  You can find list info and subscribe (required to post) at
    >     https://hpcrdm.lbl.gov/mailman/listinfo/blcr-arm
    >
    >     -Paul
    >
    >
    >     Manish Dwivedi wrote:
    >
    >         Hi All,
    >
    >         I am trying to use BLCR for ARM. But when I am trying to use
    >         cr_checkpoint with a hello.c program it is giving me an error
    >         as below:
    >
    >         cr_checkpoint --term <pid> (command run)
    >         Checkpoint failed: Cannot allocate memory
    >
    >         I have compiled hello.c in the same kernel as mentioned in the
    >         release notes, I am using blcr-0.7.2.tar.gz for this.
    >
    >         Could anyone help me out resolving this issue so that I can
    >         test it. It works fine for me on a X86 machine.
    >
    >         Regards,
    >         Manish
    >
    >
    >
    >     -- 
    >     Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    >     <mailto:PHHargrove_at_lbl_dot_gov>
    >     Future Technologies Group                 HPC Research Department
    >                       Tel: +1-510-495-2352
    >     Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    >
    >
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    

  • Next message: Manish Dwivedi: "Re: Error while using cr_checkpoint on ARM"