Re: another question?

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Apr 22 2009 - 21:48:03 PDT

  • Next message: Askwith: "Sex Mistakes For Coupples"
    Wei Zhongwei ,
    
    What you describe is a known problem because the hsperfdata files are 
    removed by the JRE when the job terminates, including by a fatal 
    signal.  You can read more about the problem and at least one solution 
    in the BLCR FAQ: 
    (http://upc-bugs.lbl.gov/blcr/doc/html/FAQ.html#hsperfdata).  However, 
    if you wish you should try adding  --save-shared to the checkpoint command.
    
    Note that switching to a different JRE would also resolve the problem.
    
    -Paul
    
    Weizhongwei wrote:
    > Dear Professor:
    > When I checkpoint  a  java program I encounter some problems. Now I 
    > list the steps:
    > Program code:
    > public class Hello {
    >         public static void main(String[] args) {
    >                 // TODO Auto-generated method stub
    >                 for(int i=0;i<=220000;i++){
    >                         System.out.println(i);
    >                 }
    >         }
    > }
    >  
    > Step1
    > #cr_run java Hello
    > The program is running correctly……..
    > Step2
    > [root@localhost ~]# ps -a
    >   PID TTY          TIME CMD
    > 31746 pts/4    00:00:01 java
    > 31756 pts/5    00:00:00 ps
    > Step3:
    > [root@localhost ~]# cr_checkpoint 31746
    > [root@localhost ~]# ls
    > -a  anaconda-ks.cfg  context.31746  Desktop  install.log  
    > install.log.syslog  test.c
    > Step4:(some errors ,restart  failed)
    > [root@localhost ~]# cr_restart context.31746
    > - open('/tmp/hsperfdata_root/31746', 0x2) failed: -2
    > - mmap failed: /tmp/hsperfdata_root/31746
    > - thaw_threads returned error, aborting. -2
    > - thaw_threads returned error, aborting. -2
    > - thaw_threads returned error, aborting. -2
    > - thaw_threads returned error, aborting. -2
    > - thaw_threads returned error, aborting. -2
    > - thaw_threads returned error, aborting. -2
    > - thaw_threads returned error, aborting. -2
    > - thaw_threads returned error, aborting. -2
    > - thaw_threads returned error, aborting. -2
    > Restart failed: No such file or directory
    > But when I checkpoint a C program it can be correctly restarted.
    > Can you help me resolve this problem ?
    > Thank you very much !
    >  
    > blcr version 8.0
    > linux kernel version 2.2.18
    >
    >  
    >
    >
    > ------------------------------------------------------------------------
    > 好玩贺卡等你发,邮箱贺卡全新上线! 
    > <http://cn.rd.yahoo.com/mail_cn/tagline/card/*http://card.mail.cn.yahoo.com/> 
    
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     
    

  • Next message: Askwith: "Sex Mistakes For Coupples"