Re: BLCR 0.7.0 beta1 is now available

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Fri May 02 2008 - 16:30:31 PDT

  • Next message: Neal Becker: "2.6.25 supported?"
    Gijsbert Wiesenekker wrote:
    > Paul,
    > 
    > I have a question that I could not find easily in the documentation.
    > When I checkpoint a task that uses almost all physical memory (8GB),
    > taking a checkpoint takes about three hours (and the system becomes
    > almost unusable). Obviously this is because taking the checkpoint pushes
    > the system beyond it's memory limits.
    > How much memory is needed to make a checkpoint?
    > 
    > Regards,
    > Gijsbert
    [snip]
    
    Gijsbert,
    
      In testing we've done in the past, we've measured checkpoint time as a
    function of application memory size.  What we found is that time was
    roughly linear with memory until the memory exceeded about 5/8 of
    physical memory (so 5GB on your 8GB machine).  Beyond that level of
    memory usage, the time grew faster than linearly (though we never ran
    any 3 hr tests cases).  So, I'd recommend keeping the app's usage below
    3/4 as a rule of thumb.  However, the actual behavior seemed to vary
    with kernel version, presumably due to changes in memory management policy.
      The very simple code we used to run these tests is in the directory
    examples/io_bench of the BLCR sources.  The executable takes 1 argument:
    the memory size in MB, and reports the time to checkpoint.
      We hope in the future to be able to take some advantage of O_DIRECT to
    avoid the buffer management that pushes the things so hard when using a
    large fraction of physical memory.
    
    -Paul
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    

  • Next message: Neal Becker: "2.6.25 supported?"