From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Sep 06 2005 - 11:22:25 PDT
To add a little to Jason's response. 1) We don't do anything w/ file locks at the moment. 2) We don't "recover" the file contents in general. If the file were being written append only (either due to flags at open, or just by usage pattern), then when restarting we will truncate the file back to the length it had at the time the checkpoint was taken - effectively restoring the file to the same state it had previously. However, if the program seeks between writes or if another program modifies the file, then we don't (yet) try to roll-back the writes that took place between the checkpoint and the restart. -Paul JCDuell_at_lbl_dot_gov wrote: >On Tue, Sep 06, 2005 at 01:35:16PM +0300, Emmanuel Grumbach wrote: > >>Hello, >> >>I have read the pages on Checkpoint. It seems very interesting but there >>is an info I could not get. Does BLCR support open files ? In other words, >>if my application has opened a file for reading/writing (writing with lock >>seems more fun) and I checkpoint it, supposing the file still exists >>(logically (path) or on the same inodes), will BLCR be able to open it >>again ? >> > >Yes, we handle the general case of an application with open files. If >the file exists with the same *logical* pathname (the inode number does >not need to be the same), the file will be reopened, and seeked to the >same position as it was at checkpoint time. This means that if you have >a global filesystem, you will be able to restart a program on a >different node in a cluster, so long as all the files the program needs >to restart (including shared libraries and the executable's program >text, etc.) are in the same logical place in the file system. > >Note that we do not handle certain types of files (TCP or Unix domain >sockets, for instance). > > -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900