Re: checkpoint and hsperfdata

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Jan 03 2006 - 10:58:44 PST

  • Next message: Michael Brown: "Re: checkpoint and hsperfdata"
    Whatever /tmp/hsperfdata_user is, it is not something internal to BLCR.
    So, I can only assume it is created/used by the Matlab code, perhaps
    indirectly through some library linked into the application. Not knowing
    specifically what the files are I can't guarantee that they can safely
    be copied between hosts. You could see problems, for instance, if the
    files contain information like the IP address of the original host, or
    license keys tied to the MAC address of the original host.
    In the present version of BLCR, open files are dealt with only "by
    reference", and we must blindly assume that the files and containing
    directories still exist, with unmodified contents, at restart. In the
    future we will have the option to capture the content of files as well.
    We are looking at having some configuration or heuristic to distinguish
    file systems that are local (such as /tmp) from ones that are shared
    (such as an NFS-mounted /home) to decide when to capture the file
    content. I have no estimated date for such a feature.
    Michael Brown wrote:
    >I'm testing checkpoints for the 32bit linux 2.4
    >kernel.  I'm using two hosts with identical hardware
    >and images.  I'm trying to make sure that I can
    >restore checkpoints between hosts.  
    >I noticed that although the basic counting example can
    >be checkpointed on host0 and started on host1, my
    >custom Matlab code cannot.
    >System messages suggested the restart failed because
    >the /tmp/hsperfdata_user directory didn't exist on
    >host1.  After copying this directory between hosts,
    >the restart worked properly.  
    >I'm wondering if it is safe to do this before I put it
    >in widespread use.  Are there any other system files
    >that should be copied also?  What is the hsperfdata
    >directory?  Can this information be stored in the
    >checkpoint itself?
    >Yahoo! DSL  Something to write home about. 
    >Just $16.99/mo. or less. 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

  • Next message: Michael Brown: "Re: checkpoint and hsperfdata"