From: Jin Zhang (jin_at_vpac_dot_org)
Date: Mon Sep 22 2008 - 23:29:45 PDT
Dear BLCR, I've got a problem by using BLCR. I install BLCR in cluster, and tried to run with Torque for a serial job. I've configured Torque with --enable-blcr, I've installed BLCR into kernel with insmod, and I've create the script that mom_priv need. However, when I run qhold, there was an error message as following: Sep 23 15:43:00 wayland003 pbs_mom: mach_checkpoint, checkpoint args: /usr/spool/PBS/mom_priv/blcr_checkpoint_script 28676 155.wayland.in.vpac.org wl /usr/spool/PBS/checkpoint ckpt.155.wayland.in.vpac.org.1222148580 15 Sep 23 15:43:00 wayland003 checkpoint_script: Invoked: /usr/spool/PBS/mom_priv/blcr_checkpoint_script 28676 155.wayland.in.vpac.org wl /usr/spool/PBS/checkpoint ckpt.155.wayland.in.vpac.org.1222148580 15 Sep 23 15:43:00 wayland003 checkpoint_script: Subcommand (cr_checkpoint --signal 15 --tree 28676 --file ckpt.155.wayland.in.vpac.org.1222148580) failed with rc=16777215: Then I check qstat -f 155, Job_state = R, it still running. When I ran: cr_checkpoint --signal 15 --tree 28676 --file ckpt.155.wayland.in.vpac.org.1222148580, there was another error: Checkpoint failed: support missing from application Can you please tell me what's the problem Thanks -- Jin Zhang Systems Administrator Victorian Partnership for Advanced Computing 110 Victoria St. Carlton South, VIC, 3053 AU E: jin_at_vpac_dot_org P: +61 (03) 9925 4942