BLCR on IA64

From: 强 马 (vera_wx_cn_at_yahoo_dot_com.cn)
Date: Tue Nov 25 2008 - 23:50:46 PST

  • Next message: Jerry Mersel: "Re: LAM: Checkpoint is correct, BUT cannot restart with LAM+BLCR"
    Hello
    聽
    聽聽聽聽聽 聽聽聽 BLCR is wonderful! 
    聽聽聽聽 We have聽 developed a checkpoint/restart system for mvapich program based on BLCR. 
    It's running on X86 cluster and being planted to IA64. So I fixed BLCR because it couldn't work on IA64. 
    聽聽聽聽 Now I have a trouble on IA64. Alougth my mvapich processes restared from checkpoint files successfully, Segmentation fault always happened after the processes restarted for a while. I check the core file by gdb, all the registers are zero, so no any stack information can be got. I guess it's memory fault.
    聽聽聽聽 If I don't cancel the program after the checkpoints are finished and let it continue to run, it runs kindly until terminated normally. Otherwise, I cancel the program when checkpoints are finished, then restarted it from checkpoint files, I find the above segment fault.
    聽聽聽聽 How to resolve this problem? Can you help me, and give me any tips? thanks you on advanced.
    
    聽聽聽聽 
    
    
          ___________________________________________________________ 
      濂界帺璐哄崱绛変綘鍙戯紝閭璐哄崱鍏ㄦ柊涓婄嚎锛 
    http://card.mail.cn.yahoo.com/
    

  • Next message: Jerry Mersel: "Re: LAM: Checkpoint is correct, BUT cannot restart with LAM+BLCR"