From: Paul H. Hargrove (PHHargrove_at_lbl.gov)
Date: Mon Oct 28 2002 - 17:41:05 PST
Sriram et al, I started today to mess w/ checkpointing LAM jobs myself. Well I have not had much luck yet, I did identify and fix one key bug. I was finding that the pthread manager thread was abort()ing when a checkpoint was requested. I traced this to a bad assumption in the code which attempts to save the pthread manager's pipe. The fix has been checked in. -Paul -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-495-2998