Re: inquring about checkpoint

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Thu Mar 11 2010 - 20:34:46 PST

  • Next message: Paul H. Hargrove: "Re: question about implement checkpoint into MPI program"
    Luyang Dong,
    
    I am afraid that we in the BLCR team did not write the LAM/MPI code for 
    integration with BLCR.
    So, I may have no better idea than you about what that error means.
    
    The only thing that I can think of is that I recall that there was some 
    problem with open log files if LAM/MPI was built with --enable-debug and 
    BLCR support.  I doubt that is your problem, but thought I'd mention it 
    just in case it was useful.
    
    There is no longer any LAM/MPI development community who might be able 
    to help with your question.  All of the developers from LAM/MPI that are 
    still in that line of work are now part of the Open MPI project: 
    http://www.open-mpi.org
    
    Open MPI is integrated with BLCR at least as well as LAM/MPI ever was.  
    So, unless you have some specific need for LAM/MPI I would suggest you 
    switch to Open MPI.
    
    -Paul
    
    luyang dong wrote:
    > dear teachers:
    >        I am a graduate student from department of computer science and 
    > technologny of shandong university.Recently, I was confused with the 
    > use of LAM/MPI integrating with blcr. I run a mpi program like 
    > this,*mpirun -np 4 -ssi cr_base_dir /home/cu0605/blcr  
    > hello_world*.(and hello_world is a the name of mpi program),and then I 
    > use ps -ef|grep hello_world to find its pid. After that I run another 
    > command *lamcheckpoint -ssi cr blcr -pid 24224.(assuming the pid of 
    > mpirun is 24224). *Then I press ctrl-c to kill the mpirun program,and 
    > run *lamrestart -ssi cr blcr -ssi cr_blcr_context_file 
    > context.mpirun.24224. *But the result of this command always outputs 
    > *mpirun: Bad file descriptor.* I do not know how to deal with it,and I 
    > want to know how to solve this problem.
    >  
    >                                                                                                     
    > thanks a lot
    >                                                                                                     
    > best wishes
    >                                                                                                    
    > Luyang Dong
    >                                                                                                    
    > 3.12 2010
    >
    >
    >   
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     
    

  • Next message: Paul H. Hargrove: "Re: question about implement checkpoint into MPI program"