Re: BLCR file checkpointing doubt

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Apr 01 2008 - 09:02:41 PST

  • Next message: Paul H. Hargrove: "Re: "Permission denied" error"
    I believe that when you say "there is an entry for that file in the proc 
    filesystem", that you mean that "ls -l /proc/<pid>/fd" shows something 
    for the file.  If you mean something else, let me know.
    As to what is happening wrong after you remove the vfs_unlink() from the 
    cr_mkunlinked() logic, I have only one guess based on your description. 
      My guess is that the file is getting created with a name like 
    ".blcr_0123.456789ab" .  Because this name starts with '.' the "ls" 
    command won't show it by default, try "ls -a".  If this *is* what is 
    happening, then the issue is that the cr_mkunlinked() code is renaming 
    the file before creating it.  The rename is done only when the last 
    argument ("unlinked_id") is non-zero.
    Both the rename and the unlink are triggered by the non-zero 
    "unlinked_id" that cr_mkunlinked() passes to cr_filp_mknod().  It should 
    be sufficient to call cr_mkunlinked() with a zero value for the 
    unlinked_id argument, though you may need to remove the debugging check 
    that checks for non-zero value at lines 933-936 of cr_io.c (assuming 
    BLCR version 0.6.2 or newer).
    Let me know if you need anything else.
    -Paul wrote:
    > Dear sir,
    > While trying to implement BACKUP_RESTORE policy in file checkpointing, we
    > came across a problem. Specifically, while restarting the process from
    > it's context file, the file opened by the process ( outfile : opened by
    > Examples/file_counting/file_counting  ) does not get created on the disk
    > filesystem.
    > To implement the BACKUP_RESTORE policy, we have used your function
    > cr_mkunlinked() (in file cr_io.c ) logic, with the modification that we
    > are not doing vfs_unlink() in our version of the function. We think that
    > doing this should create a normal file that is not  unlinked ( since we
    > are not performing the vfs_unlink() in our version ), even if we delete
    > the file ( outfile ) by rm command after we have taken the checkpoint.
    > However, this does not happen. NO file gets created on the disk
    > filesystem, but there is an entry for that file in the proc filesystem,
    > which was getting updated after we ran cr_restart. What we wanted was this
    > file should have been created on the disk (if removed) and should have
    > been updated like it is being updated right now in the proc filesystem.
    > Could you please tell us what is wrong with our modification ( removal of
    > vfs_unlink() from your cr_mkunlinked() function ) ?
    > Thanks,
    > Manish Kumar & Abhinav Jha
    > IIT Guwahati - India
    > -----------------------------------------------------------------------------------
    > This email was sent from IIT Guwahati Webmail. If you are not  the intended recipient, please contact the sender by email and delete all copies; your cooperation in this regard is appreciated.
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

  • Next message: Paul H. Hargrove: "Re: "Permission denied" error"