Re: Open Files

Date view	Thread view	Subject view	Author view	Attachment view

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Sep 06 2005 - 11:22:25 PDT

Next message: Adolfo J. Banchio: "Checkpoint failed: support missing from application"

Previous message: jcduell_at_lbl_dot_gov: "Re: Open Files"
In reply to: jcduell_at_lbl_dot_gov: "Re: Open Files"

To add a little to Jason's response.

1) We don't do anything w/ file locks at the moment.
2) We don't "recover" the file contents in general.  If the file were
being written append only (either due to flags at open, or just by usage
pattern), then when restarting we will truncate the file back to the
length it had at the time the checkpoint was taken - effectively
restoring the file to the same state it had previously.  However, if the
program seeks between writes or if another program modifies the file,
then we don't (yet) try to roll-back the writes that took place between
the checkpoint and the restart.

-Paul

JCDuell_at_lbl_dot_gov wrote:
>On Tue, Sep 06, 2005 at 01:35:16PM +0300, Emmanuel Grumbach wrote:
>  
>>Hello,
>>
>>I have read the pages on Checkpoint. It seems very interesting but there
>>is an info I could not get. Does BLCR support open files ? In other words,
>>if my application has opened a file for reading/writing (writing with lock
>>seems more fun) and I checkpoint it, supposing the file still exists
>>(logically (path) or on the same inodes), will BLCR be able to open it
>>again ?
>>    
>
>Yes, we handle the general case of an application with open files.  If
>the file exists with the same *logical* pathname (the inode number does
>not need to be the same), the file will be reopened, and seeked to the
>same position as it was at checkpoint time.  This means that if you have
>a global filesystem, you will be able to restart a program on a
>different node in a cluster, so long as all the files the program needs
>to restart (including shared libraries and the executable's program
>text, etc.) are in the same logical place in the file system.
>
>Note that we do not handle certain types of files (TCP or Unix domain
>sockets, for instance).
>
>  

-- 
Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
Future Technologies Group                 
HPC Research Department                   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Next message: Adolfo J. Banchio: "Checkpoint failed: support missing from application"

Previous message: jcduell_at_lbl_dot_gov: "Re: Open Files"
In reply to: jcduell_at_lbl_dot_gov: "Re: Open Files"

Date view	Thread view	Subject view	Author view	Attachment view