jcduell_at_lbl_dot_gov
Date: Fri Apr 23 2004 - 10:30:58 PDT
On Fri, Apr 23, 2004 at 11:36:59AM -0500, Kevin wrote: > Dear All, > > When we use blcr to checkpoint the MPI program, the current pid of > mpirun is needed. But in fact, it is not possible that we ask all the > programmers who code mpi applications print out the pid of mpirun in > theire program. So it there some suggestion to get the pid of mpirun > without modifying the user MPI application source codes, while we are > ready to use blcr's command line functionality? Tingyu, The normal case will evenutally be for our checkpoint/retart software to be integrated into the batch system on a machine (like PBS, LoadLeveller, etc.), so you'd use a utility like 'qstat' to get your job's ID, and then you could checkpoint/restart it with that ID. We are adding such support to the Scalable Software Systems system project's software: http://www.scidac.org/ScalableSystems/ And we hope at some point to see OpenPBS support too (and support from commercial vendors). -- Jason Duell Future Technologies Group <jcduell_at_lbl_dot_gov> Computational Research Division Tel: +1-510-495-2354 Lawrence Berkeley National Laboratory