From: Paul H. Hargrove (PHHargrove_at_lbl.gov)
Date: Wed Oct 30 2002 - 12:22:51 PST
I've cleaned up the "pthread hack" codea great deal. It now relies on the kernel to restore the correct PIDs and thus is only responsible for saving/restoring the manager pipe. It has also been fixed to behave better in the case that we are checkpointed before the first call to pthread_create() and thus don't have a manager thread. I also fixed a locking bug that would cause the sequence checkpoint-restart-checkpoint-restart to hang in the second restart. (Though I doubt anyone else had tried checkpointing a restarted job.) I continue to bang on the code as much as possible and it is getting harder and harder to find any bugs (though there are plenty of undocumented features which remain). -Paul -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-495-2998