From: Mark Calleja (M.Calleja_at_damtp.cam.ac.uk)
Date: Fri Nov 23 2007 - 09:15:46 PST
Hi, I'm trying to get checkpointing using the BLCR kernel modules to work with Condor (http://www.cs.wisc.edu/condor/), but I've run into a hitch. I can get a sample, dynamically linked, x86_64 application to run and checkpoint successfully using v0.6.1 of the BLCR modules when run directly from the command line. However, when submitted to the same machine using Condor via a Parrot shell (http://www.cse.nd.edu/~ccl/software/parrot/), then although the job starts running successfully with cr_run, attempts to checkpoint the job with a separate process using cr_checkpoint fail with the error message: "Requested kernel interface version is not supported" Is there any reason why this error should occur, especially when command-line operation on the same box succeeds? BTW, Parrot is used to provide a user-space file system which talks to a chirp server (http://www.cse.nd.edu/~ccl/software/chirp/) in order to save the checkpointed state off the execute host. The tests were carried out on a Debian "etch" box, kernel 2.6.18-5-amd64, and the application was built and linked with g++ v 4.1.2. Regards, Mark