From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Jul 16 2008 - 14:46:40 PDT
Ladislav Subr wrote: > Hello, > > I've experienced a problem when restarting a process which has an open file > larger than 2GB. The cr_restart utility returns retcode 22 with the > message 'Restart failed: Invalid argument'. In system logs, I see: > > kernel: blcr: Couldn't restore file pointer. > kernel: blcr: cr_restore_all_files [27114]: Unable to restore fd 3 > (type=1,err=-2009075712) > kernel: blcr: cr_rstrt_child [27114]: Unable to restore files! > (err=-2009075712) > > The system is CentOS 5.2 with vanilla kernel 2.6.22.19 x86_64 on Opteron CPU > and BLCR 0.7.1. I think that I have noticed this problem already some time > ago with another verion of BLCR. > > Thank you in advance for any help to overcome this problem. > > Ladislav > Ladislav, Thanks for the bug report. I've looked at the code path that generates the message "Couldn't restore file pointer" and I am fairly confident the problem is simply that we are not testing the return from sys_lseek() correctly. The result is that the large file offset (>2GB) is being interpreted incorrectly by BLCR as a negative value indicating an error. If you could, please apply the attatched patch to 0.7.1 and let me know if this resolves the problem. -Paul -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 Index: cr_module/cr_rstrt_req.c =================================================================== RCS file: /var/local/cvs/lbnl_cr/cr_module/cr_rstrt_req.c,v retrieving revision 1.292.2.3 diff -u -r1.292.2.3 cr_rstrt_req.c --- cr_module/cr_rstrt_req.c 24 Jun 2008 20:51:35 -0000 1.292.2.3 +++ cr_module/cr_rstrt_req.c 16 Jul 2008 21:27:27 -0000 @@ -1636,9 +1636,9 @@ } /* restore position in file */ - retval = sys_lseek(file_info->fd, open_file.f_pos, 0); - if (retval < 0) { + if (sys_lseek(file_info->fd, open_file.f_pos, 0) != open_file.f_pos) { CR_ERR("Couldn't restore file pointer."); + retval = -EINVAL; goto out_free; } @@ -1706,9 +1706,9 @@ } /* restore position */ - retval = sys_lseek(file_info->fd, open_dir.f_pos, 0); - if (retval < 0) { + if (sys_lseek(file_info->fd, open_dir.f_pos, 0) != open_dir.f_pos) { CR_ERR("Couldn't restore file pointer."); + retval = -EINVAL; goto out_free; }