From: Eric Roman (ERoman_at_lbl_dot_gov)
Date: Fri Apr 23 2004 - 10:43:42 PDT
If you just want to track the PIDs, I you can use the PBS job ID to look up the session ID on the node (forgot how... but easy if your nodes aren't time-shared, just log in.), and then use pstree to find the PID. You could mod. the mpirun, too. Users usually don't build their own MPI libs. - E On Fri, Apr 23, 2004 at 10:30:58AM -0700, JCDuell_at_lbl_dot_gov wrote: > On Fri, Apr 23, 2004 at 11:36:59AM -0500, Kevin wrote: > > Dear All, > > > > When we use blcr to checkpoint the MPI program, the current pid of > > mpirun is needed. But in fact, it is not possible that we ask all the > > programmers who code mpi applications print out the pid of mpirun in > > theire program. So it there some suggestion to get the pid of mpirun > > without modifying the user MPI application source codes, while we are > > ready to use blcr's command line functionality? > > Tingyu, > > The normal case will evenutally be for our checkpoint/retart software to > be integrated into the batch system on a machine (like PBS, > LoadLeveller, etc.), so you'd use a utility like 'qstat' to get your > job's ID, and then you could checkpoint/restart it with that ID. > > We are adding such support to the Scalable Software Systems system > project's software: > > http://www.scidac.org/ScalableSystems/ > > And we hope at some point to see OpenPBS support too (and support from > commercial vendors). > > > -- > Jason Duell Future Technologies Group > <jcduell_at_lbl_dot_gov> Computational Research Division > Tel: +1-510-495-2354 Lawrence Berkeley National Laboratory -- Eric Roman Computational Research Division 510-486-6420 Berkeley Lab