From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Sat Mar 01 2008 - 20:14:35 PST
Neal Becker wrote: > On Friday 29 February 2008, Paul H. Hargrove wrote: > >> I am pleased to announce the release of BLCR 0.6.5 to fix a two >> relatively uncommon kernel-level bugs. All users are encouraged to >> upgrade to avoid possible kernel crashes. >> > > Tested on x86_64. No tests failed. > > I notice these kernel messages, I assume this is normal? > > > Mar 1 09:34:51 nbecker1 kernel: cr_rstrt_child [17864]: PID conflict found > by cr_reserve_ids() > Mar 1 09:34:51 nbecker1 kernel: cr_rstrt_child [17872]: PID conflict found > by cr_reserve_ids() > Mar 1 09:34:51 nbecker1 kernel: cr_rstrt_child [17877]: PID conflict found > by cr_reserve_ids() > Mar 1 09:34:53 nbecker1 kernel: cr_rstrt_child [17886]: PID conflict found > by cr_reserve_ids() > Mar 1 09:34:53 nbecker1 kernel: cr_rstrt_child [17892]: PID conflict found > by cr_reserve_ids() > Mar 1 09:35:55 nbecker1 kernel: Skipped zombie task 18426 - a post-restart > wait() will not find this task > Yes, Neal. Those are "normal". When a checkpoint or restart fails for various reasons, we log the fact. There are a number of the tests that trigger failures intentionally, hence these messages. -Paul -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900