From: Ladislav Subr (subr_at_sirrah.troja.mff.cuni.cz)
Date: Sun Jan 24 2010 - 07:13:04 PST
Hello, I have recent experience with BLCR 0.8.2 failing (the system hangs during checkpoint) on vanilla 2.6.27.44, while I was successfully using 2.6.27.39 for a couple of months. May it be due to the same (security?) patch that was applied to the RedHat kernel as well? I can eventually try various 2.6.27.x kernels if it helps to locate the problem. BTW, long time ago, on 2.4 kernels, it was quite safe to move jobs between kernels of different version. On 2.6 my experience was (till yesterday) that difference of the least significant version number is safe. L. > hi all, > > (i'm not on the list so please put me in CC when replying.) > > we are using blcr 0.8.2 on sl5.4 x86_64 systems and we are seeing > strange things with restarting checkpoints taken on kernel version A and > then restarting it on kernel version B. > B is supposed to be a 'security/bug fix only' update of A (from > 2.6.18-164.6.1.el5 to 2.6.18-164.11.1.el5, but who knows what patches > are in there ;) > > checkpoint/restart works fine on both versions (also the testsuite > RUN_ME passes all tests), but when restarting a checkpointed job from A > on B, the machine gives a kernel panic (and i can't find the complete > panic message :( > > is there any guideline on the behaviour of restarting on different > kernels (same BLCR version though, but only the blcr module rpm is > upgraded, i assume that the other rpms are independent of the kernel). > is this suposed to work at all times? > > many thanks, > > stijn