From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Aug 29 2007 - 15:39:32 PDT
Abhinav Jha wrote: > Dear Sir, > > We're final year students from Indian Institute of Technology, Guwahati ( > http://www.iitg.ernet.in ), working on our B.Tech. project, > "Implementation of checkpoint and restart mechanism on the linux kernel > 2.6". Thank you for your interest in BLCR. You will find my answers to your questions below. > > We wanted to make use of the already existing facilities of BLCR in this > regard. However, we're not aware of a few things: > > 1. Whether we can change your code without violating your copyright. BLCR is distributed under 2 Open Source Software licenses, the GPL and LGPL. You should examine the license.txt files in each directory for information on which license applies to the files in that directory. The GPL allows you to modify the covered portions of BLCR provided that you distribute your modified version under the same GPL license. The LGPL allows slightly more freedom in how you may use the covered portions of BLCR. In either case, you should not have any problems if this is only for a class project. If you plan to distribute the resulting enhancements to the general public, you should expect to simply apply the same licenses to the modified versions. You don't need to obtain any permissions from us to do so. However, if you do develop enhancements of general interest, we should talk about incorporating your changes back into the base BLCR code. > 2. What is the feasibility of implementing socket checkpointing in BLCR. Good question. We have not tried to pursue this task ourselves, and therefore have not tried hard to determine the exact level of difficulty. Assuming you are interested only in Unix-domain (aka AF_LOCAL) sockets, I imagine the problems are small since the buffered data is all local to one node. In the case of TCP, you can probably get away with preserving only the data that is buffered locally (both incoming and outgoing) and counting on retransmission to recover any data "on the wire" at checkpoint time. The difficulty, however, is likely to come from getting the TCP state engine back to the right state. For UDP, you can probably do the same as TCP. If you also want to attempt migration of TCP or UDP sockets, then you will need some way to "adjust" the peer as well. > 3. Can we do an implementation of file checkpointing, that is independent > of the one you have planned ? We have code in the soon-to-be-released 0.6.0 version of BLCR that takes care of checkointing of open-but-deleted files. That code can easily be leveraged to checkpoint all open files, whether or not they are deleted. The interesting part comes at restart time when you need to determine whether to use the checkpointed copy of a file or the copy that now exists on disk. Depending on how a given application uses files (and how users of the application expect to use the files after the application runs) there is no single correct policy. The implementation work to be done here is certainly simpler than socket checkpointing. > 4. What would be a good way to go about reading/modifying the code , since > there is no manual avaiable ? I am afraid we don't have a good answer for this one. We try to put comments in the kernel code that are sufficient for our own use when we look at code that another member of our group has written, or our code long after it was written. However, it will take a good bit of time to learn the code just by reading it. Alas, there is no documentation other than the code itself. > > We'll be very grateful to hear from you. Feel free to ask more questions if you need to. > Thank you, > > Abhinav Jha & Manish Kumar, > Indian Institute of Technology Guwahati > Guwahati -39, INDIA > http://www.iitg.ernet.in -Paul -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900