From: Adolfo J. Banchio (banchio_at_famaf_dot_unc_dot_edu.ar)
Date: Tue Jul 31 2007 - 06:46:05 PDT
We have here a cluster which has a mixture ia32 and EM64T cpu's. And as we already know it is not possible to restart a job (even a 32bit one) that started in an 64 bit node in a 32 bit one. However, even defining in the default script for the queue system a default architecture to prevent jobs started in one to continue in other, one user ended up restarting in the wrong architecture producing a KERNEL PANIC !!. So, my suggestion is, if possible, to prevent cr_restart to proceed if it realizes that the checkpoint is from different architecture and deliver a corresponding error message. We are using blcr-0.5.0_b5-1 on the 64bit nodes and blcr-0.5.0_b1-1 on the 32bit ones. Just for your information. Best regards, adolfo P.S.: again, this is just a suggestion, for a minor thing. -- Adolfo J. Banchio <banchio_at_famaf_dot_unc_dot_edu.ar>