Re: LAM: Checkpoint is correct, BUT cannot restart with LAM+BLCR

From: Jerry Mersel (jerry.mersel_at_weizmann.ac.il)
Date: Sun Nov 30 2008 - 00:45:26 PST

  • Next message: Paul H. Hargrove: "Re: LAM: Checkpoint is correct, BUT cannot restart with LAM+BLCR"
    Hi Paul:
    
    
     Would insmod <filename> cr_ktrace_mask=0xffffffff have the same effect?
    
    
    
    
    > Jerry,
    >  Please try loading the BLCR modules with "make insmod
    > cr_ktrace_mask=0xffffffff" to enable the highest level of debugging
    > output.  I suspect there will be additional output after the "parent
    > linkage" message.
    > -Paul
    >
    > Jerry Mersel wrote:
    >> Hi:
    >>
    >>    I also see the same errors as  zhangkan.
    >>
    >>    Also stopping on Parent linkage.
    >>
    >>    I just manage to start mpirun but not the children,
    >>    and I need to reboot the machine to get rid of mpirun.
    >>    I can't kill it. It goes into permanent sleep mode.
    >>
    >>
    >>                             Regards,
    >>                                Jerry
    >>
    >>
    >
    >
    > --
    > Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    > Future Technologies Group                 Tel: +1-510-495-2352
    > HPC Research Department                   Fax: +1-510-486-6900
    > Lawrence Berkeley National Laboratory
    >
    >
    

  • Next message: Paul H. Hargrove: "Re: LAM: Checkpoint is correct, BUT cannot restart with LAM+BLCR"