Re: Error in exec

From: Eric Roman (ERoman_at_lbl_dot_gov)
Date: Wed May 19 2004 - 09:48:03 PDT

  • Next message: Kevin: "Re: Error in exec"
    Best I can tell, this is an error coming from LAM.  It looks like the "Error
    in exec" message is produced by crtcp when it fails to exec a new mpirun.
    Most likely reason for exec() to fail is that the executable wasn't found.
    I'd check the path that the MPI app is using.  Make sure it includes mpirun.
     - E
    On Wed, May 19, 2004 at 10:07:21AM -0500, Kevin wrote:
    > Dear Sir, 
    > I used lam7.0.4 combined with blcr-0.2.0 to perform checkpoint mpi program. It works fine with single program and MPI program running on one node before.Today when I tried to checkpoint a MPI program (the "hello" program under example directory with LAM package)running on one node of our cluster, the MPI program could be checkpointed and context file is saved. But when I try to restart it, it returns "Error in exec" to the screen.I can't figure out where the problem is.Could you please give me some suggestion?
    > Below are some information on my operation and configuration:
    > [kevin@Sparrow-01-02 ~/src]mpirun C ./hello 
    > //it works fine and information displayed at console 1, 
    > [kevin@Sparrow-01-02 ~/src] getpid mpirun 
    > //I got the pid of mpirun with a script "getpid" from console 2, assumed it is 344
    > [kevin@Sparrow-01-02 ~/src]cr_checkpoint 344
    > //checkpoint the ./hello from console2, it works fine, the context.344 is saved to disk
    > [kevin@Sparrow-01-02 ~/src]cr_restart context.344
    > Error in exec
    > ---below are configurations----------------------------------
    > [kevin@Sparrow-01-02 ~/src]lamnodes
    > n0      Sparrow-01-02.ERC.MsState.Edu:1:origin,this_node
    > [kevin@Sparrow-01-02 ~/src]laminfo
    >            LAM/MPI: 7.0.4
    >             Prefix: /home/kevin/LAM
    >       Architecture: i686-pc-linux-gnu
    >      Configured by: kevin
    >      Configured on: Mon May  3 15:45:08 CDT 2004
    >     Configure host: Sparrow-01-01.ERC.MsState.Edu
    >         C bindings: yes
    >       C++ bindings: yes
    >   Fortran bindings: yes
    >        C profiling: yes
    >      C++ profiling: yes
    >  Fortran profiling: yes
    >      ROMIO support: yes
    >       IMPI support: no
    >      Debug support: no
    >       Purify clean: no
    >           SSI boot: globus (Module v0.5)
    >           SSI boot: rsh (Module v1.0)
    >           SSI coll: lam_basic (Module v7.0)
    >           SSI coll: smp (Module v1.0)
    >            SSI rpi: crtcp (Module v1.0.1)
    >            SSI rpi: lamd (Module v7.0)
    >            SSI rpi: sysv (Module v7.0)
    >            SSI rpi: tcp (Module v7.0)
    >            SSI rpi: usysv (Module v7.0)
    >             SSI cr: blcr (Module v1.0.1)
    Eric Roman                       Computational Research Division
    510-486-6420                     Berkeley Lab

  • Next message: Kevin: "Re: Error in exec"