From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Feb 20 2008 - 09:07:30 PST
Jos�, Sorry the error reporting isn't very clear. That is one of the weaker parts of BLCR right now. Since the testsuite passes, the most likely reason for the message you see is an actual I/O failure when trying to write out the checkpoint context file for your application. The BLCR code will map (nearly) all failed write() calls to EIO, even if the actual cause was an out-of-space or over-quota error. You might find some useful information in /var/log/messages, or via dmesg, about what BLCR was doing at the time of the error. If you can send us those messages, we may be able to narrow down what the problem is. -Paul P.S. I will ensure the next release of BLCR produces a less confusing error message, such as "cr_checkpoint: checkpoint failed: Input/output error". There really should be no reference to the internal ioctl() call. Jos� M. Mart�n wrote: > Hello, > > first, thanks for this project. > > I tried to set up blcr, but I have a problem. When I lunch a program and I do > the checkpoint, I get the following error: > ioctl(/proc/checkpoint/ctrl, CR_OP_CHKPT_REAP): Input/output error > > I have tried with kernels 2.6.20 (vanilla) and 2.6.18.8-0.8 (opensuse 10.2 > default) on a node. On both, I get the same error. > Nevertheless, on other node with opensuse 10.2 and kernel 2.6.23.1, it runs > without problem. > > I have passed the testsuite: > ====================== > All 34 tests passed > (1 tests were not run) > ====================== > > No hugetlbfs mount point found (test skipped) > SKIP: hugetlbfs.ct > > I can load the blcr modules without problem, execute binaries, link > libraries,... > > I'm using version 0.6.4 > Nodes are x86 (Pentium 4) > > Any help will be apreciated. > > Thanks in advance > > > -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900