Re: LAM: Checkpoint is correct, BUT cannot restart with LAM+BLCR

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Nov 26 2008 - 10:20:14 PST

  • Next message: Paul H. Hargrove: "Re: BLCR on IA64"
    Jerry,
     Please try loading the BLCR modules with "make insmod 
    cr_ktrace_mask=0xffffffff" to enable the highest level of debugging 
    output.  I suspect there will be additional output after the "parent 
    linkage" message.
    -Paul
    
    Jerry Mersel wrote:
    > Hi:
    >
    >    I also see the same errors as  zhangkan.
    >
    >    Also stopping on Parent linkage.
    >
    >    I just manage to start mpirun but not the children,
    >    and I need to reboot the machine to get rid of mpirun.
    >    I can't kill it. It goes into permanent sleep mode.
    >
    >
    >                             Regards,
    >                                Jerry
    >
    >   
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     
    

  • Next message: Paul H. Hargrove: "Re: BLCR on IA64"