From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Mon Jan 28 2008 - 11:14:34 PST
Locus, Most of the time, EINVAL from the restart operation means that the checkpoint context file appears invalid, either because i contains unexpected data or because ti is truncated. Based on what you describe, you are sending the checkpoint output to a filedescriptor via "cr_checkpoint -F" (as opposed to "-f"). So one possibility is that you may simply need to close the file before the parent exits (immediately after cr_poll_checkpoint() is a good place). It is also possible that the file was open in the original program and some data other than the checkpoint was written to it. You should check your system logs (/var/log/message or dmesg) to see what information the kernel printed when the CR_RSTRT_PROCS call failed. That may help narrow down the source of the problem. -Paul Locus Jackson wrote: > Hi, > I come across this problem when I restart my program. > I use system("cr_checkpoint -F fildes") in my program to set a > checkpoint,but when I use execlp("cr_restart filename") > to restart my program in its child process(wait until parent is > exited),it always tells me that "cri_syscall(CR_RSTRT_PROCS): Invalid > arguments". > What does this error mean,and what should I do to solve this error in > order to use execlp("cr_restart filename") in child process to > rollback to the checkpoint set by system("cr_checkpoint -F fildes") > (fildes: file descriptor of filename). > Thank you for your help. > Regards > Locus. > > ------------------------------------------------------------------------ > Looking for last minute shopping deals? Find them fast with Yahoo! > Search. > <http://us.rd.yahoo.com/evt=51734/*http://tools.search.yahoo.com/newsearch/category.php?category=shopping> -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900