Re: BLCR full system lockup during cr_checkpoint

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Feb 13 2008 - 10:48:40 PST

  • Next message: David Kesler: "Re: BLCR full system lockup during cr_checkpoint"
    David Kesler wrote:
    > Hello,
    > I've been trying to get BLCR to work on two different systems, one 
    > Ubuntu 7.1, one Fedora Core 8, both running on an x86 machine as Dom0 
    > of a Xen setup.  In both cases, I can compile, install, and 
    > successfully run BLCR IF I do not load Xen and instead boot into the 
    > generic kernel.  If I am in the Xen version of the kernel though I 
    > have two problems.  One, the installation process fails due to 
    > missing  #defines, include files, or other variables in the linux 
    > source directory's include folder.  I can, however, finagle BLCR into 
    > compiling by selectively modifying certain headers.  (Yes, I am well 
    > aware that this is unsafe and may be a leading cause of my problems 
    > and I'm also wondering if you know why this would be happening.)
    > In both systems however, if I attempt to call cr_checkpoint on a 
    > running process while booted into the Xen kernel, I get a full system 
    > hang where it responds to absolutely nothing, requiring a hard 
    > reboot.  I know that messing around with the headers probably doesn't 
    > help the situation, but because both systems fail in the exact same 
    > way I was wondering whether, assuming that BLCR compiled correctly, 
    > there's some problem with running BLCR from within a kernel loaded by 
    > Xen. 
    > Thank you,
    > David Kesler
      We are able to run within Xen in our current development version of 
    BLCR, having adjusted our autoconf magic to locate the proper headers.  
    We have not tested the released version with a Xen paravitrualized 
    kernel; mainly because of the same header problems you have encountered.
      I suspect that the lockup occurs as a result of using a 
    non-paravitualized instruction to access one of the CPU's special 
    registers (a possible result of getting "generic" headers).
      If you are willing to play guinea pig, I can create a snapshot of the 
    current development and send you a URL.  Let me know.
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

  • Next message: David Kesler: "Re: BLCR full system lockup during cr_checkpoint"