From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Thu May 08 2008 - 14:39:31 PDT
I am pleased to announce the availability of BLCR 0.7.0_beta2 from http://mantis.lbl.gov/blcr-dist/ Both source tarball and SRPM are available. The filenames and MD5 checksums are: blcr-0.7.0_b2.tar.gz e0b9a1c9f692482d0e7d6ddf7ca9dd2a blcr-0.7.0_b2-1.src.rpm 904b9357b3b49fc48a2e68c15745b0da Relative to the eventual 0.7.0 release, this Beta is lacking only final revisions to the HTML documentation, and of course testing by all you nice folks. Below is an excerpt from the NEWS file listing user-visible changes relative to the 0.6.x series. If you have reported a bug recently, especially if you've been told to expect a fix in 0.7.0, please try out this Beta release and let us know if your problem is fixed and especially if it is not. Relative to the 0.6.x release series, BLCR is using a new configure and build mechanism that should work correctly with Xen-enabled kernels and other kernels that were problematic in previous releases. If you are unable to configure or build this beta on any system where 0.6.x built successfully, PLEASE let us know because the new configure/build mechanism might be at fault. -Paul PS You are receiving this either because you are on the checkpoint_at_lbl_dot_gov list, because you've recently sent email to the list (or me directly) asking about BLCR status, or because our Bugzilla shows your interests in a bug fixed in this beta. NEWS: 0.7.0_b2 -------- May 8, 2008 Enhanced functionality release. - This version still supports only kernels though 2.6.24. There is not yet full support for 2.6.25, but we are hope to have that for 0.7.1. - As previously announced, this release removes support for 2.4.x kernels with the current exception of the RH9 and RHEL kernels (and derivatives) that contain backported NPTL support. These too may be removed in the not too distant future. - As previously announced, this release begins the removal of support for LinuxThreads. If you are using LinuxThreads you may experience random failures in the BLCR testsuite and with your own multithreaded apps. New interest in using BLCR with non-GNU libc may lead to a return of LinuxThreads support. Please contact us if you have interest it this. - This release adds the following features to the cr_checkpoint utility: + --quiet option to suppress output from cr_checkpoint + --noclobber option (don't disturb existing files) + Options for treatment of ptrace() child and parent processes: --ptraced-{error,skip,allow} --ptracer-{error,skip} + Options to save/restore executables and libraries in context files: --save-{exe,private,shared,all,none} See the cr_checkpoint manpage for more details on each of these. - This release adds the following features to the cr_restart utility: + --quiet option to suppress output from cr_restart Previously there was no way to do so without also losing the output of the restarted process(es). + Add --run-on-* family of options to provide user-specified error handling hooks. Previously there was no way to automatically/safely distinguish a failure of cr_restart from a non-zero exit from the restarted application. This resolves bug 1974. + Add --relocate option to enable restart-time replacement of file and and directory paths saved in the context file. See the cr_restart manpage for more details on each of these. - This release adds the following feature to the cr_run utility: + --omit option to run a process with BLCR support such that the process (and its descendents) will be omitted from checkpoints. - This release makes the following libcr API additions/changes: + cr_forward_checkpoint() is fully tested and thus no longer labeled as "use at your own risk". Its documentation in libcr.h is now complete as well. + As anticipated in previous releases, the error code returned from cr_poll_checkpoint() has CHANGED for the case of restarting from a checkpoint of oneself. This may break existing code that will not be prepared for the new errno value. However, the previous value of EINVAL could have masked actual invalid-argument errors. The alternative of returning 0 (success) was considered, but was discarded because it was deemed valuable to be able to reliably distinguish whether one was continuing or restarting from a checkpoint of oneself. + As alternatives to CR_ETEMPFAIL and CR_PERMFAIL, authors of BLCR callbacks can now specify the errno values to be returned to the checkpoint requester on a case-by-case basis. + cr_tryenter_cs() has been added as a non-blocking alternative to the cr_enter_cs() function. See the comments in include/libcr.h for API documentation. - This release introduces two "stub" libraries: libcr_run and libcr_omit. They differ from the "full" libcr library in that they contain only a BLCR signal handler and the initialization code to register it. They do not include any of the entry points declared in libcr.h, and the handler code does not run any callbacks. The cr_run utility now uses these libraries in LD_PRELOAD variable, rather than the full libcr.so used in previous releases. See the BLCR User's Guide for information on using these libs. - This release makes several additions to the BLCR test suite, including tests of most of the features new to this release and some motivated by bugs fixed in the release. Many existing tests have been expanded to exercise additional corner cases. - This release makes the following changes to "configure" behavior: + The --with-linux= option now accepts a kernel revision (the output or "uname -r") as a value, causing configure and search for that revision in some standard locations. This is intended to make it easier and less error prone to specify for which kernel to build. In most modern distributions, this single option will be sufficient to configure BLCR for any installed kernel. + Previously --with-linux= would be used to specify a kernel source directory, and if needed --with-linux-obj= could be given to help find the corresponding build directory. With this release the role of --with-linux= as changed to be that of a build directory and the option --with-linux-src= is available if the sources can't be found automatically. + The configure-time probes of the kernel headers and configuration are now performed using the full CFLAGS/CPPFLAGS from the Linux kbuild infrastructure. This ensures proper configuration with Xen-enabled kernels that prepend Xen-specific components to the include path. + At configure time one can set KCC to specify that the kernel modules are to be built with a different C compiler than the user-space components of BLCR. - On the ARM platform, the "good enough for LinuxThreads" implementation of atomic operations has been replaced with truly atomic ops based on the kernel-level support added for NPTL. - As a temporary work-around for bug 2251, BLCR will currently refuse to checkpoint processes with files on hugetlbfs mmap()ed with the MAP_PRIVATE flag. This is to avoid potentially serious instability that may result if BLCR attempts to checkpoint such a process. - Fixes the following user-visible bugs and "issues" + 1974 - Make it possible to decide whether a restart succeeded + 2023 - ARM atomics need update + 2214 - close cri_live_count race + 2216 - Move post-restart signal delivery to post-callback + 2247 - BLCR assumes 64-bit gcc on 64-bit arch + 2248 - Separate CC and KCC + 2266 - process doing CR_CHECKPOINT_TEMP_FAILURE is killed! + 2271 - cr_checkpoint --clobber fails to overwrite file + 2272 - cr_get_restart_info returns wrong src path + 2274 - Invalid (zero or huge) pids seen at restart time + Stronger validation of BLCR against proper kernel version + Validation of BLCR's kernel module versions against each other + Performance improvements from better memory management and from coalescing of background work. + Preserve error codes for I/O errors. Previously any error from a read/write at kernel level was reported as EIO. Now the original errors (such as ENOSPC) are preserved. + Several others found by internal testing or reported by email and fixed without assigning bug numbers. - The file contrib/blcr.magic contains the format description needed by the "file" utility to identify BLCR's context files. -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900