From: Dr M. Calleja (mc321_at_cam.ac.uk)
Date: Wed Nov 28 2007 - 01:45:19 PST
On Nov 27 2007, Paul H. Hargrove wrote: >Mark Calleja wrote: >> Hi, >> >> I'm trying to get checkpointing using the BLCR kernel modules to work >> with Condor (http://www.cs.wisc.edu/condor/), but I've run into a >> hitch. I can get a sample, dynamically linked, x86_64 application to >> run and checkpoint successfully using v0.6.1 of the BLCR modules when >> run directly from the command line. However, when submitted to the >> same machine using Condor via a Parrot shell >> (http://www.cse.nd.edu/~ccl/software/parrot/), then although the job >> starts running successfully with cr_run, attempts to checkpoint the >> job with a separate process using cr_checkpoint fail with the error >> message: >> >> "Requested kernel interface version is not supported" >> >> Is there any reason why this error should occur, especially when >> command-line operation on the same box succeeds? BTW, Parrot is used >> to provide a user-space file system which talks to a chirp server >> (http://www.cse.nd.edu/~ccl/software/chirp/) in order to save the >> checkpointed state off the execute host. The tests were carried out on >> a Debian "etch" box, kernel 2.6.18-5-amd64, and the application was >> built and linked with g++ v 4.1.2. >> >> Regards, >> Mark >> > >Mark, > > Sorry for the slow response. I am still catching up on e-mail from >the U.S. holiday. > > The message you see indicates a version mismatch between the BLCR >library and the BLCR kernel module(s). Since you can checkpoint at the >command line but not with Parrot, I suspect that you may have 2 versions >of BLCR installed and that the cr_run and/or cr_checkpoint command in >the PATH differs between the two methods. To confirm this, you could >try "cr_run --version" both at the command line and via Parrot. I >suspect they will report different version numbers. If that is the case >you will need to fix your PATH and/or remove the older of the two >installations. If the same version is reported both times, let me know >and we can try something else to isolate the cause of your problems. > >-Paul Hi Paul, The problem appears to be at the Parrot/BLCR interface, and unrelated to Condor. Running "cr_run --version" from an ordinary shell and one that's running under Parrot gives the same result: banani$ cr_run --version /usr/local/bin/cr_run: version 0.6.1 This is not surprising since this is my desktop and has only one version of BLCR installed, namely the one I installed. However, the problem raises its head when I run cr_checkpoint from the Parrot shell: it works just dandy from an ordinary shell but from within Parrot I get: banani$ cr_checkpoint 30697 Failed cr_init(): Requested kernel interface version is not supported The developer of Parrot (Doug Thain, at Notre Dame) is a very amenable chap and I'm confident he'd be willing to help troubleshoot this. Thanks for your help and let me know if it would aid your debugging process if I was to give you an account on my test machine. Regards, Mark