Announcing the release of BLCR 0.8.0

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Thu Jan 15 2009 - 10:08:56 PST

  • Next message: Neal Becker: "Re: Announcing the release of BLCR 0.8.0"
    I am pleased to announce the release of BLCR 0.8.0.
    
    The 0.8.0 release is now available from the BLCR Downloads page:
    http://ftg.lbl.gov/CheckpointRestart/CheckpointDownloads.shtml
    
    Relative to the 0.7.x series, this release includes some new features
    and some improvements in stability.  This release also contains support
    for newer Linux kernels. A summary of the user-visible changes in BLCR,
    relative to 0.7.3, appears below in the form of an excerpt from the NEWS
    file.
    
    -Paul
    
    PS
    You are receiving this either because you are on the checkpoint_at_lbl_dot_gov
    list, because you've recently sent email to the list (or me directly)
    asking about BLCR status, or because our Bugzilla shows your interests
    in a bug fixed in this beta.
    
    
    NEWS:
    0.8.0
    -----------
    January 12, 2009
    Enhanced functionality and expanded-support release.
      - This release adds support for 2.6.26, .27 and .28 kernels.
      - In this release support for Xen is no longer considered experimental.
        However, there is still one known xen-specific bug (2457) in which
        the FPU state may become corrupted w/ paravirtualized kernels.
      - In this release the majority of checkpoint I/O is performed using
        O_DIRECT when available, significantly reducing the cost of
    checkpointing
        any process which uses a large fraction of the physical memory.
      - This release includes an unfinished port to SPARC64, contributed by
        Vincentius Robby <vincentius_at_umich_dot_edu> and Andrea Pellegrini
        <apellegr_at_umich_dot_edu>.  Anyone willing/able to help complete this port
        should contact checkpoint_at_lbl_dot_gov.
      - As previously announced, this release removes support for 2.4.x kernels
        that contain backported NPTL support (e.g. RH9 and RHEL kernels).
    Support
        for all other 2.4.x kernels was removed in 0.7.0.
      - This release merges the blcr_vmadump kernel module into the blcr module.
      - This release adds preliminary support for the "Fault Tolerance
    Backplane"
        (FTB).  See README.FTB for more information.
      - This release adds the following features to the cr_checkpoint utility:
         + --kmsg-{none,error,warning} options to control reporting of
    kernel-level
           errors and warnings messages when taking a checkpoint.
      - This release adds the following features to the cr_restart utility:
         + --kmsg-{none,error,warning} options to control reporting of
    kernel-level
           errors and warnings messages when restarting from a checkpoint.
         + --[no-]restore-{pid,pgid,sid} options to control restore of the
           process id, process group id, and session id.  The default remains as
           in prior releases: restore only pid.
      - This release makes the following libcr API additions/changes:
         + The following functions were announced in May 2008 as scheduled for
           removal in 0.8.0.  They have not been removed, but have been marked
           with gcc's "deprecated" attribute to produce a compiler warning
    if used.
            * cr_request()
            * cr_request_file()
            * cr_request_fd()
         + These functions have been added for controlling checkpoint requests:
            * cr_wait_checkpoint()
            * cr_reap_checkpoint()
            * cr_log_checkpoint()
            * cr_poll_checkpoint_msg()
           The wait and reap functions expose independently the two steps
    taken in
           the existing cr_poll_checkpoint() function.  The log function
    collects
           kernel-level error or warning messages if called between wait and
    reap.
           The poll...msg() function is a convenience function, documented and
           implemented in terms of the wait, log and reap functions.
           The cr_poll_checkpoint() function will remain in libcr, but is now
           documented and implemented in terms of cr_poll_checkpoint_msg().
         + A new CR_CHKPT_ASYNC_ERR flag to cr_request_checkpoint() defers the
           reporting of almost all errors in a call to cr_request_checkpoint()
           until the call to cr_reap_checkpoint() or cr_poll_checkpoint[_msg]().
         + The following functions have been added for making restart requests
           via library calls, rather than using the cr_restart utility.  These
           are all marked "EXPERIMENTAL" as there might be significant changes
           to these calls in the future.
            * cr_initialize_restart_args_t()
            * cr_request_restart()
            * cr_wait_restart()
            * cr_reap_restart()
            * cr_log_restart()
            * cr_poll_restart_msg()
            * cr_poll_restart()
         + The struct members "old" and "new" in struct cr_rstrt_relocate_pair
           have been renamed to "oldpath" and "newpath".  This change was
    required
           because "new" is a C++ reserved word.
        See the comments in include/libcr.h for API documentation.
      - This release makes the following additions/changes to the BLCR test
    suite:
         + Add tests of many of the features new to this release
         + Add new tests, or cases to existing tests, for reproducing several
           of the bugs fixed in this release.
         + Fix command lines used in several tests to function correctly when
           "POSIXLY_CORRECT" is set in the environment
         + Recode crut_wrapper and seq_wrapper in C, rather than perl, to allow
           running the full testsuite in environments without perl (such as
           embedded ARM platforms).
      - This release fixes the following user-visible bugs and "issues"
         + 2021 - Provide extended error reporting mechanism
         + 2056 - Eliminate perl wrappers
         + 2287/2437 - Xen segment selector problems
         + 2292 - --restore-ids does not work correctly for multithreaded
    processes.
         + 2317 - implement "async" request errors
         + 2318 - checkpoint hangs after SEGV
         + 2322/2446 - Failure when stack limit is too big
         + 2344 - bad cr_restart usage causes kernel oops
         + 2453 - loss of sigaltstack across restart
         + 2454 - Oops in FPU restore
         + Address bug 2448 - there may have been a race with cr_close_other()
         + Fix ENOMEM when checkpointing processes with no supplementary 
    group IDs
         + i386 FPU restore code would fail to notice corrupt i387 state
         + Fix a bug in the ARM atomics
         + Fix several issues with restart of 64-bit processes with a 32-bit
           requester, as exposed by the addition of cr_request_restart() to
    libcr.
    
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory
    

  • Next message: Neal Becker: "Re: Announcing the release of BLCR 0.8.0"