Re: restart failed:Device or resource busy,found pid 4818 in use

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Thu Mar 25 2010 - 17:53:15 PDT

  • Next message: fengguang tian: "Re: restart failed:Device or resource busy,found pid 4818 in use"
    The message says that there are some pids (process IDs) in use 
    (allocated to running processes) that are needed for the restart.
    This typically happens if one tries to restart when the original run has 
    not yet exited, for instance if there are portions of it hung.
    
    With very large clusters it becomes a statistically significant 
    possibility that one could have a few random collisions with other 
    processes on the nodes.
    However the number and grouping of the pids, I strongly suspect the 
    original MPI job is still running or is hung.
    
    -Paul
    
    fengguang tian wrote:
    > Hi
    >
    > when I use ompi-restart to restart the checkpoint file in 
    > clusters(using open MPI), error happened,it shows:
    > - found pid 4813 in use
    > - found pid 4824 in use
    > - found pid 4827 in use
    > Restart failed: Device or resource busy
    > - found pid 4812 in use
    > - found pid 4822 in use
    > - found pid 4823 in use
    > Restart failed: Device or resource busy
    > - found pid 4815 in use
    > - found pid 4828 in use
    > - found pid 4829 in use
    > Restart failed: Device or resource busy
    > - found pid 4818 in use
    > - found pid 4819 in use
    > Restart failed: Device or resource busy
    > - found pid 4814 in use
    > - found pid 4825 in use
    > - found pid 4826 in use
    > Restart failed: Device or resource busy
    >
    >
    > why would this happen?
    >
    > cheers
    > fengguang
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     
    

  • Next message: fengguang tian: "Re: restart failed:Device or resource busy,found pid 4818 in use"