cr_restart: ->cri_syscall(CR_OP_RSTRT_REAP): Invalid argument

From: Christian Iwainsky (sichiwai_at_informatik.stud.uni-erlangen.de)
Date: Tue Oct 11 2005 - 05:52:18 PDT

  • Next message: Paul H. Hargrove: "Re: BLCR 0.4.1 Beta5 now available"
    Hello,
    I have a problem, with the blcr.
    I have written a distributed program, which is sucessfully checkpointed.
    But once I try to restart the second instance on one machine of the 
    program, the cr_restart function aborts with:
    cri_syscall(CR_OP_RSTRT_REAP): Invalid argument
    
    in /var/log/messages:
    Oct 11 14:15:40 faui21l kernel: thaw_threads returned error, aborting. -22
    Oct 11 14:15:40 faui21l kernel: vmadump: invalid signature
    Oct 11 14:15:40 faui21l kernel: thaw_threads returned error, aborting. -22
    Oct 11 14:15:40 faui21l kernel: vmadump: invalid signature
    Oct 11 14:15:40 faui21l kernel: thaw_threads returned error, aborting. -22
    Oct 11 14:15:40 faui21l kernel: vmadump: invalid signature
    Oct 11 14:15:40 faui21l kernel: thaw_threads returned error, aborting. -22
    Oct 11 14:15:40 faui21l kernel: vmadump: invalid signature
    Oct 11 14:15:40 faui21l kernel: thaw_threads returned error, aborting. -22
    Oct 11 14:15:40 faui21l kernel: vmadump: invalid signature
    Oct 11 14:15:40 faui21l kernel: thaw_threads returned error, aborting. -22
    Oct 11 14:15:40 faui21l kernel: vmadump: invalid signature
    Oct 11 14:15:40 faui21l kernel: thaw_threads returned error, aborting. -22
    Oct 11 14:15:40 faui21l kernel: vmadump: invalid signature
    Oct 11 14:15:40 faui21l kernel: thaw_threads returned error, aborting. -22
    Oct 11 14:15:40 faui21l kernel: vmadump: invalid signature
    Oct 11 14:15:40 faui21l kernel: thaw_threads returned error, aborting. -22
    Oct 11 14:15:40 faui21l kernel: vmadump: invalid signature
    Oct 11 14:15:40 faui21l kernel: thaw_threads returned error, aborting. -22
    Oct 11 14:15:40 faui21l kernel: vmadump: invalid signature
    Oct 11 14:15:40 faui21l kernel: thaw_threads returned error, aborting. -22
    Oct 11 14:15:40 faui21l kernel: vmadump: invalid signature
    Oct 11 14:15:40 faui21l kernel: thaw_threads returned error, aborting. -22
    
    What is the problem? (The Pid is free)
    
    
    I also experience an interesting behaviour:
    I use the following code for the checkpoint-callback:
    dsm_checkpoint_read is initialized to 0
    
    /***********************************************************/
    int chkpt_callback(void * aptr){
     fprintf(stderr,"chkpt_callback\n");
     if (!dsm_checkpoint_ready){
       // the checkpoint thread function is asleap ... don't checkpoint yet 
    but awa
    ken the checkpoint thread
       dsm_checkpoint_sleep=0;
       // postpone the checkpoint till jackal has a consistant state
       fprintf(stderr,"Postponing checkpoint ..\n");
       //cr_checkpoint(CR_CHECKPOINT_READY);
       cr_checkpoint(CR_CHECKPOINT_TEMP_FAILURE);
       return 0;
     }
     fprintf(stderr,"checkpopint callback: taking checkpoint\n");
     int chkptResult=cr_checkpoint(CR_CHECKPOINT_READY);
     if (chkptResult>0){
       fprintf(stderr,"Restarting ...\n");
       dsm_checkpoint_wakeup=1;
     } else if (chkptResult==0){
       fprintf(stderr,"checkpointing ........\n");
     }else {
       fprintf(stderr,"Checkpoint Failure\n");
       cr_checkpoint(CR_CHECKPOINT_PERM_FAILURE);
       return -1;
     }
     return 0;
    }
    
    one the callback postponed the checkpoint the program state is brought 
    to a checkpoint state, and then the cr_request_file is called to do the 
    real checkpoint.
    The program crashes on the call to cr_request_file:
    
    
    Regards,
    Christian
    

  • Next message: Paul H. Hargrove: "Re: BLCR 0.4.1 Beta5 now available"