From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Nov 27 2007 - 10:28:32 PST
Mark Calleja wrote: > Hi, > > I'm trying to get checkpointing using the BLCR kernel modules to work > with Condor (http://www.cs.wisc.edu/condor/), but I've run into a > hitch. I can get a sample, dynamically linked, x86_64 application to > run and checkpoint successfully using v0.6.1 of the BLCR modules when > run directly from the command line. However, when submitted to the > same machine using Condor via a Parrot shell > (http://www.cse.nd.edu/~ccl/software/parrot/), then although the job > starts running successfully with cr_run, attempts to checkpoint the > job with a separate process using cr_checkpoint fail with the error > message: > > "Requested kernel interface version is not supported" > > Is there any reason why this error should occur, especially when > command-line operation on the same box succeeds? BTW, Parrot is used > to provide a user-space file system which talks to a chirp server > (http://www.cse.nd.edu/~ccl/software/chirp/) in order to save the > checkpointed state off the execute host. The tests were carried out on > a Debian "etch" box, kernel 2.6.18-5-amd64, and the application was > built and linked with g++ v 4.1.2. > > Regards, > Mark > Mark, Sorry for the slow response. I am still catching up on e-mail from the U.S. holiday. The message you see indicates a version mismatch between the BLCR library and the BLCR kernel module(s). Since you can checkpoint at the command line but not with Parrot, I suspect that you may have 2 versions of BLCR installed and that the cr_run and/or cr_checkpoint command in the PATH differs between the two methods. To confirm this, you could try "cr_run --version" both at the command line and via Parrot. I suspect they will report different version numbers. If that is the case you will need to fix your PATH and/or remove the older of the two installations. If the same version is reported both times, let me know and we can try something else to isolate the cause of your problems. -Paul -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900