From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Mon Jan 14 2008 - 13:12:20 PST
Jerry, I am afraid I don't know how to help you any further. The fact that the relocation error doesn't prevent checkpoint/restart on the command line suggests to me that it is not a real problem, just an annoyance. The failure to restart w/ SGE, however, is a real problem that I don't have any solution for. As I stated before, the log messages you sent show a failure to open /dev/tty which indicates to me that the job had a controlling tty (CTTY) at checkpoint time, but does not have one at restart time. Not knowing details of SGE, I don't know how that could be happening unless MATLAB has gone to some special trouble to acquire a CTTY that was not opened by SGE. Since nohup didn't work, I don't have any further ideas on how to avoid having a CTTY at checkpoint time, and I also have no suggestions as to how to create/acquire one at restart time. If you have any ideas you would like my help to pursue, let me know and I'll try to help. However at this point I have nothing left to suggest to you. -Paul Jerry Mersel wrote: > I checked and am getting the same relocation error from the command line, > but MATLAB and checkpointing and restarting are working. > > I don't get any relocation errors when using > cr_run perl </dev/null" or "cr_run cat </dev/null". (not from SGE or the > command line). > [snip] -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900