From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Thu Apr 29 2010 - 12:44:12 PDT
Jerry, I don't know an exact or simple answer to your question, but I can list some things that I *know* would prevent migration between machines: + No migration between 32- and 64-bit CPUs even if the process was 32-bit. + No migration between kernels that are "too different", where I have not good definition. + Will SEGV if pre-linking places shared libs at different addresses (we have a FAQ entry for this) + There are 2 or more different FPU state save/restore instructions available for the kernel to use, depending on the generation of CPU. I don't know for certain, but I strongly suspect that state saved in the checkpoint by one such instruction would not restore with a different one. The last two items are my best guess since you indicate you are using the same kernel. Good luck and please let me know if you learn anything more. If we can collect more info, I will update the FAQ entry about migration. -Paul [email protected] wrote: > Hi: > > I've checkpointed/restarted jobs on different CPU's before for example: > I've checkpointed on a AMD processor and restarted on a xeon processor. > > It does not seem to work all the time however. I just did a checkpoint > on a XEON and > tried to restart on a AMD and I got a segmentation fault. trying to > restart the application. > > My question is under what circumstances I can restart on a different > x64 CPU. > How can I build my code so I won't have problems with this. (Or should > it be working). > > I am using blcr 0.8.0 on 2.6.9-55.ELsmp kernels. > > > With Blessings > and Best regards, > > Jerry > 2363 > > You shall do no unrighteousness in judgment; you shall not favor the > poor, nor favor the mighty; but in righteousness you shall judge your > neighbor. > (Torah portion, Kedoshim, Leviticus 19:15) -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900