From: Paul H. Hargrove (PHHargrove_at_lbl.gov)
Date: Thu Mar 21 2002 - 11:35:19 PST
Brian W. Barrett wrote: [snip] > Outstanding questions: > ---------------------- > > * Can we get enough communication in MPIRUN in the signal handler > context, or are we completely hosed? > > - what can we run in a signal handler context? > - If we can't, what is our next option? If you must do things which are not legal in signal handler context, then we can offer an alternative: polling (ugh!) for checkpoint requests. Looking at the header file I provided, see the comments about the CR_REG_NOASYNC flag and cr_progess(). Yes, this is already implemented as documented :-). > * What is the interface for restarting an application There are three options from C code: + exec*("context_file"), where exec* is any flavor of exec call. + cr_exec("context_file") + system("restart_utility context_file"); The exec*() option is the most appealing and will probably be the eventual preferred form. However it will be the last one implemented. The cr_exec() will be the first implementation. This will have the exec() semantics of replacing the running process, so one will usually fork() first. The last option will exist to allow shell scripts to restart things before the exec() version works. It will be a tiny wrapper around cr_exec(). NOTE THAT NONE OF THESE ARE IMPLEMENTED YET. -Paul -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov NERSC Future Technologies Group Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-495-2998