From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Fri Jan 04 2008 - 08:48:12 PST
A second beta of BLCR 0.6.2 is now available at http://mantis.lbl.gov/blcr-dist/ Both source tarball and SRPM are available. The filenames and MD5 checksums are: 5a4b9bf0aff4e31b2b43767f52ed6c27 blcr-0.6.2_b2.tar.gz a24692044da95d669d1602e3cb0cb6ca blcr-0.6.2_b2-1.src.rpm This is a beta of a 0.6.2 patch release. The intent of 0.6.2 is to fix a small number of significant bugs found in 0.6.0 and 0.6.1 and to add support for 2.6.23 kernels and some vendor-patched 2.6.22 kernels. A NEWS entry summarizing these changes appears below. Relative to the first beta, this release fixes two kernel Oops bugs that were exposed by the new test cases added in 0.6.2 beta1. You are receiving this e-mail either because you are subscribed to the checkpoint_at_lbl_dot_gov mailing list or because you have reported one of the bugs or previously unsupported kernel versions addressed by this release. I apologize if you receive multiple copies. I would greatly appreciate any feedback (positive or negative) indicating if this beta fixes any problems you have reported with BLCR 0.6.0 and/or 0.6.1. Only after I have sufficient positive feedback will I make 0.6.2 available for download from the main BLCR web pages. -Paul 0.6.2_b2 -------- January 3, 2008 Bug-fix and expanded-support release. - This release adds support for 2.6.23 kernels. - This release adds support for SuSE's 2.6.22.x kernels. - This release fixes a file descriptor leak that occurred on restart from a checkpoint-of-self requested via cr_request_checkpoint(). - This release fixes a deadlock (and unkillable process(es)) when a multi-threaded process aborts (or omits itself from) a checkpoint under certain conditions. - This release fixes a restart-time failure when a checkpoint includes a pipe with one end outside the checkpoint scope, and data is buffered in the pipe. - This release fixes a bug with the cr_request{,_file}() calls in which a failed checkpoint would cause failure of the next one if it had the same destination file name. - This release fixes a race condition with the cr_enter_cs() and checkpoints in multi-threaded processes. - This release fixes post-checkpoint signal delivery (--stop and friends) to occur after the checkpoint is fully completed. See bug 2201 for a full description of the problems addressed by these changes. - This release documents (and fully implements) signal-delivery options to cr_restart (see bug 2200). - This release fixes two kernel Oopses (bugs 2222 and 2223) due to races against processes/threads that are exiting. - Adds test cases for most of the bugs fixed in this release. - Minor improvements/changes to documentation - Other minor bug fixes -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900