Re: Checkpoint failed: support missing from application

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Fri Sep 23 2005 - 10:08:38 PDT

  • Next message: Adolfo J. Banchio: "Re: Checkpoint failed: support missing from application"
      Checkpointing with BLCR requires that a small stub library be linked 
    into an application.  The message you are seeing is the one generated 
    when a checkpoint request is issued for an application that does not 
    include this support.
    
      A LAM/MPI built with BLCR support will automatically link in this 
    library into applications it compiles.  Other applications may do so 
    explicitly when they are built, or more typically via an LD_PRELOAD done 
    by the "cr_run" utility we provide.  For instance, "cr_run ./a.out" 
    would run a.out with the BLCR library loaded.
    
      It is also possible that the application is correctly linked with the 
    library, but is somehow disabling the BLCR hook.  One can look for 
    "libcr.so" in /proc/<pid>/maps to determine if the process with the 
    given pid has the BLCR library loaded.  If it is loaded and you still 
    get the "support missing from application" messages, then we can discuss 
    how to determine the cause of the interference.
    
    -Paul
    
    Adolfo J. Banchio wrote:
    
    >Hello,
    >
    >first of all my excuses if this question was already answered
    >(in this case just point me to that answer), since I can not
    >get access to the search page of the archive.
    >
    >Now, the problem,
    >
    >I have a process running (started with cr_run)
    >
    >which gives this error message when checkpointed:
    >
    >    "Checkpoint failed: support missing from application"
    >
    >and the exit status of cr_checkpoint is 52.
    >
    >What could be the reason for this?
    >
    >By the way, I have BLCR working with SGE, and besides for this
    >user, it is working Very good for process migration.
    >
    >best regards,
    >
    >adolfo
    >
    >
    >  
    >
    

  • Next message: Adolfo J. Banchio: "Re: Checkpoint failed: support missing from application"