From: José M. Martín (jmartin_at_onsager.ugr.es)
Date: Fri Feb 22 2008 - 00:27:39 PST
Yes, please, I would like to solve the problem, but I am not a guru in this area. What kind of test can I do? There are no error messages in gluster log. El Thursday 21 February 2008 18:53:20 Paul H. Hargrove escribió: > José, > > If you only see problems with GlusterFS, then it might be a problem w/ > GlusterFS, but it might still be a problem with BLCR. I know almost > nothing about GlusterFS, but did see at their wiki that it is a > user-space filesystem. It is possible that could interact > poorly/strangely with BLCR, which initiates writes from kernel addresses. > If you are interested in debugging the problem, I will provide what > assistance I can by email. > > -Paul > > José M. Martín wrote: > > I have done some aditional test. > > > > It only fails on a volume mounted with GlusterFS, a distribuited FS. In > > local drive, it works. So, it must be a issue with this FS. > > > > There are no entries in /var/log/messages and dmesg about the error. > > > > Thanks, > > > > José > > > > El Wednesday 20 February 2008 18:07:30 Paul H. Hargrove escribió: > >> José, > >> > >> Sorry the error reporting isn't very clear. That is one of the weaker > >> parts of BLCR right now. > >> Since the testsuite passes, the most likely reason for the message you > >> see is an actual I/O failure when trying to write out the checkpoint > >> context file for your application. The BLCR code will map (nearly) all > >> failed write() calls to EIO, even if the actual cause was an > >> out-of-space or over-quota error. > >> You might find some useful information in /var/log/messages, or via > >> dmesg, about what BLCR was doing at the time of the error. If you can > >> send us those messages, we may be able to narrow down what the problem > >> is. > >> > >> -Paul > >> > >> P.S. > >> I will ensure the next release of BLCR produces a less confusing error > >> message, such as "cr_checkpoint: checkpoint failed: Input/output > >> error". There really should be no reference to the internal ioctl() > >> call. > >> > >> José M. Martín wrote: > >>> Hello, > >>> > >>> first, thanks for this project. > >>> > >>> I tried to set up blcr, but I have a problem. When I lunch a program > >>> and I do the checkpoint, I get the following error: > >>> ioctl(/proc/checkpoint/ctrl, CR_OP_CHKPT_REAP): Input/output error > >>> > >>> I have tried with kernels 2.6.20 (vanilla) and 2.6.18.8-0.8 (opensuse > >>> 10.2 default) on a node. On both, I get the same error. > >>> Nevertheless, on other node with opensuse 10.2 and kernel 2.6.23.1, it > >>> runs without problem. > >>> > >>> I have passed the testsuite: > >>> ====================== > >>> All 34 tests passed > >>> (1 tests were not run) > >>> ====================== > >>> > >>> No hugetlbfs mount point found (test skipped) > >>> SKIP: hugetlbfs.ct > >>> > >>> I can load the blcr modules without problem, execute binaries, link > >>> libraries,... > >>> > >>> I'm using version 0.6.4 > >>> Nodes are x86 (Pentium 4) > >>> > >>> Any help will be apreciated. > >>> > >>> Thanks in advance