From: 任明明 (0110018_at_mail.nankai.edu.cn)
Date: Tue Mar 22 2005 - 01:27:58 PST
I can not use blcr to checkpoint a MPI program. who can help me? I used the following command to configure blcr: /configure --prefix=/usr/local/blcr/ --with-linux=/usr/src/linux-2.4.20-8/ --with-system-map=/boot/System.map-2.4.20-8 and used the following command to configure the lam/mpi: /configure --with-threads=posix --with-rpi=crtcp --with-cr-blcr=/usr/local/blcr/ --prefix=/usr/local/lam-7.1.1/ --with-rsh='ssh -x' but when i use cr_checkpoint to deal with a MPI program, it doesn't generate the checkpoint context file for each process, only generate a context file for the mpirun command, and when i use cr_restart to the uniq context, it says [rmingming@node01 lam]$ cr_restart context.5981 mpirun (rpwait): Bad file descriptor [rmingming@node01 lam]$ by the way, i followed the instuctin on this url: http://mantis.lbl.gov/blcr/doc/html/BLCR_Users_Guide.html the following is the laminfo output: [rmingming@node01 lam]$ laminfo -all LAM/MPI: 7.1.1 SSI boot: globus (SSI v1.0, API v1.1, Module v0.6) SSI boot: rsh (SSI v1.0, API v1.1, Module v1.1) SSI boot: slurm (SSI v1.0, API v1.1, Module v1.0) SSI boot: tm (SSI v1.0, API v1.1, Module v1.1) SSI coll: lam_basic (SSI v1.0, API v1.1, Module v7.1) SSI coll: shmem (SSI v1.0, API v1.1, Module v1.0) SSI coll: smp (SSI v1.0, API v1.1, Module v1.2) SSI rpi: crtcp (SSI v1.0, API v1.1, Module v1.1) SSI rpi: lamd (SSI v1.0, API v1.0, Module v7.1) SSI rpi: sysv (SSI v1.0, API v1.0, Module v7.1) SSI rpi: tcp (SSI v1.0, API v1.0, Module v7.1) SSI rpi: usysv (SSI v1.0, API v1.0, Module v7.1) SSI cr: blcr (SSI v1.0, API v1.0, Module v1.1) SSI cr: self (SSI v1.0, API v1.0, Module v1.0) Prefix: /usr/local/lam-7.1.1/ Bindir: /usr/local/lam-7.1.1//bin Libdir: /usr/local/lam-7.1.1//lib Incdir: /usr/local/lam-7.1.1//include Pkglibdir: /usr/local/lam-7.1.1//lib/lam Sysconfdir: /usr/local/lam-7.1.1//etc Architecture: i686-pc-linux-gnu Configured by: root Configured on: Tue Mar 22 14:21:29 CST 2005 Configure host: node01 Memory manager: ptmalloc2 C bindings: yes C++ bindings: yes Fortran bindings: yes C compiler: gcc C char size: 1 C bool size: 1 C short size: 2 C int size: 4 C long size: 4 C float size: 4 C double size: 8 C pointer size: 4 C char align: 1 C bool align: 1 C int align: 4 C float align: 4 C double align: 4 C++ compiler: g++ Fortran compiler: g77 Fortran symbols: double_underscore Fort integer size: 4 Fort real size: 4 Fort dbl prec size: 4 Fort cplx size: 4 Fort dbl cplx size: 4 Fort integer align: 4 Fort real align: 4 Fort dbl prec align: 4 Fort cplx align: 4 Fort dbl cplx align: 4 C profiling: yes C++ profiling: yes Fortran profiling: yes C++ exceptions: no Thread support: yes ROMIO support: yes IMPI support: no Debug support: no Purify clean: no SSI base: parameter "verbose" (default value: <none>) SSI mpi: parameter "mpi_hostmap" (default value: "/usr/local/lam-7.1.1//etc/lam-hostmap.txt") SSI base: parameter "base_module_path" (default value: "/usr/local/lam-7.1.1//lib/lam") SSI boot: parameter "boot_verbose" (default value: <none>) SSI boot: parameter "boot" (default value: <none>) SSI boot: parameter "boot_base_promisc" (default value: "0") SSI boot: parameter "boot_base_window_size" (default value: "5") SSI boot: parameter "boot_globus_priority" (default value: "3") SSI boot: parameter "boot_rsh_username" (default value: <none>) SSI boot: parameter "boot_rsh_agent" (default value: "ssh -x") SSI boot: parameter "boot_rsh_no_n" (default value: "0") SSI boot: parameter "boot_rsh_no_profile" (default value: "0") SSI boot: parameter "boot_rsh_fast" (default value: "0") SSI boot: parameter "boot_rsh_ignore_stderr" (default value: "0") SSI boot: parameter "boot_rsh_priority" (default value: "10") SSI boot: parameter "boot_slurm_priority" (default value: "50") SSI boot: parameter "boot_tm_priority" (default value: "50") SSI boot: parameter "boot_tm_first" (default value: "-1") SSI rpi: parameter "rpi_verbose" (default value: <none>) SSI rpi: parameter "rpi" (default value: <none>) SSI rpi: parameter "rpi_crtcp_priority" (default value: "75") SSI rpi: parameter "rpi_crtcp_short" (default value: "65536") SSI rpi: parameter "rpi_crtcp_sockbuf" (default value: "-1") SSI rpi: parameter "rpi_lamd_priority" (default value: "20") SSI rpi: parameter "rpi_sysv_pollyield" (default value: "1") SSI rpi: parameter "rpi_sysv_poolsize" (default value: "16777216") SSI rpi: parameter "rpi_sysv_maxalloc" (default value: "1048576") SSI rpi: parameter "rpi_sysv_short" (default value: "8192") SSI rpi: parameter "rpi_tcp_short" (default value: "65536") SSI rpi: parameter "rpi_tcp_sockbuf" (default value: "-1") SSI rpi: parameter "rpi_sysv_priority" (default value: "30") SSI rpi: parameter "rpi_tcp_priority" (default value: "20") SSI rpi: parameter "rpi_usysv_readlockpoll" (default value: "10000") SSI rpi: parameter "rpi_usysv_writelockpoll" (default value: "10") SSI rpi: parameter "rpi_usysv_pollyield" (default value: "1") SSI rpi: parameter "rpi_usysv_poolsize" (default value: "16777216") SSI rpi: parameter "rpi_usysv_maxalloc" (default value: "1048576") SSI rpi: parameter "rpi_usysv_short" (default value: "8192") SSI rpi: parameter "rpi_usysv_priority" (default value: "40") SSI coll: parameter "coll_verbose" (default value: <none>) SSI coll: parameter "coll_shmem" (default value: "0") SSI cr: parameter "cr_verbose" (default value: <none>) SSI cr: parameter "cr" (default value: <none>) SSI cr: parameter "cr_blcr_priority" (default value: "50") SSI cr: parameter "cr_self_priority" (default value: "25") SSI cr: parameter "cr_self_do_restart" (default value: "0") SSI cr: parameter "cr_self_prefix" (default value: "lam_cr_self") SSI cr: parameter "cr_self_checkpoint" (default value: <none>) SSI cr: parameter "cr_self_continue" (default value: <none>) SSI cr: parameter "cr_self_restart" (default value: <none>)