BLCR Error

From: Jin Zhang (jin_at_vpac_dot_org)
Date: Mon Sep 22 2008 - 23:29:45 PDT

  • Next message: Neal Becker: "kernel based checkpoint"
    Dear BLCR,
    
    I've got a problem by using BLCR.
    I install BLCR in cluster, and tried to run with Torque for a serial job.
    I've configured Torque with --enable-blcr, I've installed BLCR into kernel with insmod, and I've create the script that mom_priv need.
    
    However, when I run qhold, there was an error message as following:
    
    Sep 23 15:43:00 wayland003 pbs_mom: mach_checkpoint, checkpoint args: /usr/spool/PBS/mom_priv/blcr_checkpoint_script 28676 155.wayland.in.vpac.org wl /usr/spool/PBS/checkpoint ckpt.155.wayland.in.vpac.org.1222148580 15
    Sep 23 15:43:00 wayland003 checkpoint_script: Invoked: /usr/spool/PBS/mom_priv/blcr_checkpoint_script 28676 155.wayland.in.vpac.org wl /usr/spool/PBS/checkpoint ckpt.155.wayland.in.vpac.org.1222148580 15 
    Sep 23 15:43:00 wayland003 checkpoint_script: Subcommand (cr_checkpoint --signal 15 --tree 28676 --file ckpt.155.wayland.in.vpac.org.1222148580) failed with rc=16777215: 
    
    Then I check qstat -f 155, Job_state = R, it still running.
    
    When I ran:
    cr_checkpoint --signal 15 --tree 28676 --file ckpt.155.wayland.in.vpac.org.1222148580,
    there was another error:
    Checkpoint failed: support missing from application
    
    Can you please tell me what's the problem
    
    Thanks
    
    -- 
    Jin Zhang
    
    Systems Administrator
    Victorian Partnership for Advanced Computing
    110 Victoria St. Carlton South, VIC, 3053 AU
    E: jin_at_vpac_dot_org    P: +61 (03) 9925 4942 
    

  • Next message: Neal Becker: "kernel based checkpoint"