From: Eric Roman (eroman_at_lbl.gov)
Date: Tue Mar 26 2002 - 11:46:56 PST
All of you expressed some interest in checkpoint/restart on Linux. Here's a quick summary of what's going on. Checkpoint/Restart web page is up --------------------------------- The project now has a web page. http://www.nersc.gov/research/FTG/checkpoint/index.html Requirements for Linux Checkpoint/Restart ----------------------------------------- We've placed our requirements document online. We'd like to get some feedback from users, library developers, and kernel developers. Please have a look! http://www.nersc.gov/research/FTG/checkpoint/LBNL-49659.pdf Checkpoint/Restart for MPI -------------------------- We've started working with Professor Andrew Lumsdaine and the LAM crew to add a checkpoint/restart capability to LAM. (LAM is a popular implementation of MPI.) This work will take place during summer 2002. Checkpoint/Restart mailing list is now available ------------------------------------------------ We've established a mailing list for checkpoint/restart development. An archive of the list are available on our web page. To subscribe, send a message to majordomo_at_lbl_dot_gov, with the words subscribe checkpoint your-email-address somewhere in the message body. Current Work ------------ We are looking at the CRAK implementation of checkpoint/restart, and bproc's vmadump (meant for process migration, but can do checkpoint/restart). This work will lead to a technical report describing the work done in checkpoint/restart for Linux to date. Our kernel work is making good progress. We're establishing entry points for checkpoint/restart in the kernel and user processes, designing a format for context files, and looking at our testing environment. In a month or two, we expect to be able to checkpoint simple processes. -- Eric Roman <eroman_at_lbl_dot_gov> Future Technologies Group 510-486-6420 Lawrence Berkeley National Laboratory