BLCR Administrator's Guide
This guide describes how to install, configure, and maintain Berkeley
Checkpoint/Restart for Linux.
System Requirements
BLCR consists of two kernel modules, some user-level libraries, and several
command-line executables. No kernel patching is required.
BLCR has been engineered to work with a wide range of Linux kernels:
- Most major vendor distributions of Linux. Those tested include RedHat 7.1 ->
9, SuSE 9, and CentOS 3.1.
- Vanilla Linux kernels (from kernel.org) 2.4.0 -> 2.4.27 (with
glibc versions
2.1 -> 2.3) have also been tested.
- BLCR uses a set of autoconf-based feature tests to probe the kernels
it builds against. It is thus likely that a custom kernel based on one of the
above kernel sources will work with BLCR, provided that we don't step on
each other's toes...
- Experimental Linux 2.6 support is now available (new in 0.4).
Testing of the 2.6 support has so far been limited to a SuSE 9.2 installation
with kernel package kernel-default-2.6.8-24.11 on a uniprocessor machine.
Reports of your success or failure with other distributions and/or kernels is
appreciated.
BLCR uses assembly code to save some program state (most notably the CPU
registers). This means that the BLCR kernel modules are not portable across CPU
architectures "out of the box". Currently only x86-based systems work with
BLCR. Support for other architectures is planned (most notably the
Opteron port is currently in progress). Porting BLCR to a different CPU is not a large software effort
for those with kernel experience and knowledge of the target CPU's instructions.
Please contact us if you are interested in contributing a port.
Installing/Configuring BLCR
To build checkpoint/restart, you need the following files:
- The source code for the kernel you are building against.
- linux/version.h (a generated file from the kernel sources)
- either the System.map or the vmlinux file for the
kernel you are building against.
- A copy of the BLCR source (blcr-X.Y.Z.tar.gz: see http://ftg.lbl.gov/checkpoint for a link to the latest
version).
Note for Red Hat users
Under Red Hat, this means you must install the kernel and (depending on
the release) either the kernel-source or kernel-devel RPMs
appropriate for the kernel you will use CR with. For example, to build
checkpoint/restart for a Red Hat 2.4.20-20.9 kernel, you need the
kernel-source-2.4.20-20.9 RPM, and the kernel-2.4.20-20.9 RPM.
Once these RPMs are installed, the BLCR configure script will generally
find the files it needs automatically, so you won't need to pass any additional
arguments to configure.
However, certain recent versions of Redhat-derived distributions (most
notably some releases from the Fedora
project) are shipping kernels with a stripped-down System.map file. If
this is the case, BLCR will abort during configuration with an error stating
that the System.map cannot be used. You must install an additional RPM which
contains a full System.map in order to build BLCR. In Fedora Core Release 2, for
instance, the 'kernel-debuginfo' RPM contains a full System.map file, which it
will install into the /usr/lib/debug/boot directory. BLCR's configure
script has been modified to search this directory (you can also pass
'--with-system-map' to point configure at the correct System.map file).
Important Note: If you need to install the
kernel-debuginfo RPM, make sure the correct version
is installed. Specifically, the 'arch' type must be the same. If your
kernel was built for the 'i386' (or 'i586', or
'i686'), the kernel-debuginfo RPM must have the same value. Thus,
for an i586 kernel, install
'kernel-debuginfo-2.6.5-1.358.i586.rpm'. To determine which
kernel version you have, use
rpm -q kernel --qf '%{version}-%{release}.%{arch}\n'
To make sure that you have installed compatible kernel and
kernel-debuginfo RPMs, use
rpm -q kernel kernel-debuginfo --qf '%{version}-%{release}.%{arch}\n'
(replace 'kernel' with 'kernel-smp' if you are using an SMP
kernel). You should see the same string, repeated twice.
If you try to use BLCR with the wrong System.map, BLCR will
build without complaints, but will probably detect the problem when the
blcr.o kernel module is loaded (it does this by comparing some
well-known exported kernel symbols' addresses to those provided by the
System.map file), and the module load will be aborted.
Configuring BLCR
BLCR builds and installs much like any other autotools-based
distribution:
% tar zxvf blcr-X.Y.Z.tar.gz
% cd blcr-X.Y.Z
% ./configure [ options ]
% make
% make install
Depending on which kernel you are building against, and where you wish to put
the BLCR libraries, there are a number of options to configure that you
need to consider.
Choosing an installation directory
By default BLCR will install into /usr/local. To choose a different
directory tree to install into, pass the '--prefix' flag to
configure:
- --prefix=[the directory you wish to install into]
Building against a kernel other than the one that's running
By default, BLCR builds against the kernel that is running on the system
at configure time, and looks in a number of standard locations
(/usr/src/linux, etc.) for the above files that correspond to it.
If you're building checkpoint/restart for a kernel other than the
kernel that is running at the time of the build (or if the source for the
running kernel are in non-standard locations), you'll need to pass
configure the following options:
- --with-linux=[path to the sources for the kernel you
are building for]
Unless System.map or vmlinux exists in the directory given to
--with-linux you'll also need to pass one of the following
two options:
- --with-system-map=[path to the System.map file]
(usually /boot/System.map, or the System.map file in the
root of the kernel build tree)
- --with-vmlinux=[path to the kernel executable]
(usually /boot/vmlinux, or the vmlinux file at the root of
the kernel build tree)
Compiling BLCR
Just type 'make':
% make
Installing BLCR
Use the standard 'install' make target to install the BLCR utilities
and libraries, and to place the kernel modules in the standard location for your
kernel:
% make install
Loading the Kernel Modules
Before you can checkpoint/restart applications, the kernel modules need to be
loaded into your kernel.
The kernel modules are placed into a subdirectory of the lib/blcr
branch of the installation directory. In this example, we'll assume the installation
prefix was the default /usr/local and that your kernel is version 2.4.20-24.9.
Thus, for this example the kernel modules are in the directory
/usr/local/lib/blcr/2.4.20-24.9/. There are two kernel modules in
this directory. They must both be loaded (in the correct order) for BLCR to function.
As root, load the kernel modules in this order:
# /sbin/insmod /usr/local/lib/blcr/2.4.20-24.9/vmadump_blcr.o
# /sbin/insmod /usr/local/lib/blcr/2.4.20-24.9/blcr.o
You may wish to set up your system to load these modules by default at boot
time. The exact mechanism for doing so differs between Linux distributions, and
thus requires an experienced system administrator. However, a template init
script is provided as etc/blcr.rc in the BLCR source directory.
Configuring Users' environments
Finally, you may wish to add the appropriate BLCR directories to the default
$PATH, $LD_LIBRARY_PATH , and $MANPATH environment
variables for your users.
You may either modify the /etc/profile and/or /etc/cshrc files, or
you may provide modules that
accomplish the same thing). You should replace PREFIX by the installation
prefix (such as /usr/local) in the following examples:
For Bourne-style shells:
$ PATH=$PATH:PREFIX/bin
$ MANPATH=$MANPATH:PREFIX/man
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:PREFIX/lib
$ export PATH MANPATH LD_LIBRARY_PATH
For csh-style shells:
% setenv PATH ${PATH}:PREFIX/bin
% setenv MANPATH ${MANPATH}:PREFIX/man
% setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:PREFIX/lib
Making RPMs from the BLCR sources
An alternate way to install BLCR is to build a binary RPM for your system, which
you can then install. This has certain advantages (such as making upgrading
easier, especially if you maintain BLCR on multiple systems).
Building binary RPMs from the source tarball
The simplest method for building RPMs is to just
% make rpms
after configure. If successful, the new RPM packages will be in the
rpm/RPMS subdirectory of the build tree. The resulting packages will be
for whatever kernel you configured for.
Building a binary RPM from source RPMS
You may also with start from a source RPM (with a .src.rpm suffix)
rather than the .tar.gz version of the BLCR distribution. Source RPMs are
available on our website.
These source RPMs are configured to build for the running kernel.
Alternatively, the make rpms step above will create a source RPM
in the rpm/SRPMS subdirectory of the build tree, valid for the
configured kernel.
If building as root, built RPMs will be placed in a subdirectory of
/usr/src/redhat/RPMS. However, if you are not root, you may
need to see
this page at IBM
for information on configuring an output location before proceeding.
To build binary RPMs from the source RPM, use
% rpmbuild --rebuild blcr-X.Y.Z-N.src.rpm --target ARCH
replacing blcr-X.Y.Z-N.src.rpm with the correct filename,
and ARCH with a specific target CPU. If you don't know
your target, try "uname -p" to determine it.
If you don't specify a --target, the default will depend on the
version of rpmbuild and may be i386 (which will be rejected).
See the documentation for rpmbuild for more information on building
binary RPMs from source RPMs.
The RPMs should build without error. However, if not building for the running
kernel, you may see a warning about this. You will see the location of the binary
RPMs in the last few lines of output from rpmbuild - see something like this:
Wrote: /usr/src/redhat/SRPMS/blcr-0.4.0-1.src.rpm
Wrote: /usr/src/redhat/RPMS/i686/blcr-0.4.0-1.i686.rpm
Wrote: /usr/src/redhat/RPMS/i686/blcr-libs-0.4.0-1.i686.rpm
Wrote: /usr/src/redhat/RPMS/i686/blcr-devel-0.4.0-1.i686.rpm
Wrote: /usr/src/redhat/RPMS/i686/blcr-modules_2.4.20_24.9-0.4.0-1.i686.rpm
You should note that the kernel version 2.4.20-24.9 has become
2.4.20_24.9 in the name of the blcr-modules package (a
change of a hyphen to an underscore).
For more information
For more information on Checkpoint/Restart for Linux, visit the project home
page: http://ftg.lbl.gov/checkpoint
For more information on LAM/MPI, see the LAM/MPI
Documentation.