Re: Problem restarting process with large file

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Jul 16 2008 - 14:46:40 PDT

  • Next message: Ladislav Subr: "Re: Problem restarting process with large file"
    Ladislav Subr wrote:
    > Hello,
    >
    > I've experienced a problem when restarting a process which has an open file 
    > larger than 2GB. The cr_restart utility returns retcode 22 with the 
    > message 'Restart failed: Invalid argument'. In system logs, I see:
    >
    >  kernel: blcr: Couldn't restore file pointer.
    >  kernel: blcr: cr_restore_all_files [27114]:  Unable to restore fd 3 
    > (type=1,err=-2009075712)
    >  kernel: blcr: cr_rstrt_child [27114]:  Unable to restore files!  
    > (err=-2009075712)
    >
    > The system is CentOS 5.2 with vanilla kernel 2.6.22.19 x86_64 on Opteron CPU 
    > and BLCR 0.7.1. I think that I have noticed this problem already some time 
    > ago with another verion of BLCR.
    >
    > Thank you in advance for any help to overcome this problem.
    >
    > 	Ladislav
    >   
    
    Ladislav,
      Thanks for the bug report.
      I've looked at the code path that generates the message "Couldn't restore file pointer" and I am fairly confident the problem is simply that we are not testing the return from sys_lseek() correctly.  The result is that the large file offset (>2GB) is being interpreted incorrectly by BLCR as a negative value indicating an error.
      If you could, please apply the attatched patch to 0.7.1 and let me know if this resolves the problem.
    
    -Paul
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    
    
    Index: cr_module/cr_rstrt_req.c
    ===================================================================
    RCS file: /var/local/cvs/lbnl_cr/cr_module/cr_rstrt_req.c,v
    retrieving revision 1.292.2.3
    diff -u -r1.292.2.3 cr_rstrt_req.c
    --- cr_module/cr_rstrt_req.c	24 Jun 2008 20:51:35 -0000	1.292.2.3
    +++ cr_module/cr_rstrt_req.c	16 Jul 2008 21:27:27 -0000
    @@ -1636,9 +1636,9 @@
         }
     
         /* restore position in file */
    -    retval = sys_lseek(file_info->fd, open_file.f_pos, 0);
    -    if (retval < 0) {
    +    if (sys_lseek(file_info->fd, open_file.f_pos, 0) != open_file.f_pos) {
             CR_ERR("Couldn't restore file pointer.");
    +	retval = -EINVAL;
     	goto out_free;
         }
     
    @@ -1706,9 +1706,9 @@
         }
     
         /* restore position */
    -    retval = sys_lseek(file_info->fd, open_dir.f_pos, 0);
    -    if (retval < 0) {
    +    if (sys_lseek(file_info->fd, open_dir.f_pos, 0) != open_dir.f_pos) {
             CR_ERR("Couldn't restore file pointer.");
    +	retval = -EINVAL;
     	goto out_free;
         }
     
    

  • Next message: Ladislav Subr: "Re: Problem restarting process with large file"