Re: problems with cr_checkpoint: ioctl(/proc/checkpoint/ctrl, CR_OP_CHKPT_REAP): Input/output error

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Feb 20 2008 - 09:07:30 PST

  • Next message: Jos� M. Mart�n: "Re: problems with cr_checkpoint: ioctl(/proc/checkpoint/ctrl, CR_OP_CHKPT_REAP):Input/output error"
    Jos�,
    
      Sorry the error reporting isn't very clear.  That is one of the weaker 
    parts of BLCR right now.
      Since the testsuite passes, the most likely reason for the message you 
    see is an actual I/O failure when trying to write out the checkpoint 
    context file for your application.  The BLCR code will map (nearly) all 
    failed write() calls to EIO, even if the actual cause was an 
    out-of-space or over-quota error.
      You might find some useful information in /var/log/messages, or via 
    dmesg, about what BLCR was doing at the time of the error.  If you can 
    send us those messages, we may be able to narrow down what the problem is.
    
    -Paul
    
    P.S.
    I will ensure the next release of BLCR produces a less confusing error 
    message, such as "cr_checkpoint: checkpoint failed: Input/output 
    error".  There really should be no reference to the internal ioctl() call.
    
    Jos� M. Mart�n wrote:
    > Hello,
    >
    > first, thanks for this project. 
    >
    > I tried to set up blcr, but I have a problem. When I lunch a program and I do 
    > the checkpoint, I get the following error:
    > ioctl(/proc/checkpoint/ctrl, CR_OP_CHKPT_REAP): Input/output error
    >
    > I have tried with kernels 2.6.20 (vanilla) and 2.6.18.8-0.8 (opensuse 10.2 
    > default) on a node. On both, I get the same error.
    > Nevertheless, on other node with opensuse 10.2 and kernel 2.6.23.1, it runs 
    > without problem.
    >
    > I have passed the testsuite:
    > ======================
    > All 34 tests passed
    > (1 tests were not run)
    > ======================
    >
    > No hugetlbfs mount point found (test skipped)
    > SKIP: hugetlbfs.ct
    >
    > I can load the blcr modules without problem, execute binaries, link 
    > libraries,...
    >
    > I'm using version 0.6.4
    > Nodes are x86 (Pentium 4)
    >
    > Any help will be apreciated.
    >
    > Thanks in advance
    >
    >
    >   
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    

  • Next message: Jos� M. Mart�n: "Re: problems with cr_checkpoint: ioctl(/proc/checkpoint/ctrl, CR_OP_CHKPT_REAP):Input/output error"