From: Abhinav Jha (a.jha_at_iitg.ernet.in)
Date: Sat Sep 01 2007 - 06:05:09 PDT
Dear Sir, Thank you for your kind reply. For the last few days, we have been going through the BLCR code and are trying to figure out how a process is checkpointed by BLCR. Is there a platform/forum where we could discuss BLCR? We will try our best to club together our doubts in future so that we don't cause you too much trouble ( we hope ). Thanks once again, Abhinav Jha & Manish Kumar, Indian Institute of Technology Guwahati Guwahati -39, INDIA http://www.iitg.ernet.in > Abhinav Jha wrote: >> Dear Sir, >> >> We're final year students from Indian Institute of Technology, Guwahati >> ( >> http://www.iitg.ernet.in ), working on our B.Tech. project, >> "Implementation of checkpoint and restart mechanism on the linux kernel >> 2.6". > > Thank you for your interest in BLCR. You will find my answers to your > questions below. > >> >> We wanted to make use of the already existing facilities of BLCR in this >> regard. However, we're not aware of a few things: >> >> 1. Whether we can change your code without violating your copyright. > > BLCR is distributed under 2 Open Source Software licenses, the GPL and > LGPL. You should examine the license.txt files in each directory for > information on which license applies to the files in that directory. > > The GPL allows you to modify the covered portions of BLCR provided that > you distribute your modified version under the same GPL license. > > The LGPL allows slightly more freedom in how you may use the covered > portions of BLCR. > > In either case, you should not have any problems if this is only for a > class project. If you plan to distribute the resulting enhancements to > the general public, you should expect to simply apply the same licenses > to the modified versions. You don't need to obtain any permissions from > us to do so. However, if you do develop enhancements of general > interest, we should talk about incorporating your changes back into the > base BLCR code. > >> 2. What is the feasibility of implementing socket checkpointing in BLCR. > > Good question. We have not tried to pursue this task ourselves, and > therefore have not tried hard to determine the exact level of > difficulty. Assuming you are interested only in Unix-domain (aka > AF_LOCAL) sockets, I imagine the problems are small since the buffered > data is all local to one node. In the case of TCP, you can probably get > away with preserving only the data that is buffered locally (both > incoming and outgoing) and counting on retransmission to recover any > data "on the wire" at checkpoint time. The difficulty, however, is > likely to come from getting the TCP state engine back to the right > state. For UDP, you can probably do the same as TCP. > > If you also want to attempt migration of TCP or UDP sockets, then you > will need some way to "adjust" the peer as well. > >> 3. Can we do an implementation of file checkpointing, that is >> independent >> of the one you have planned ? > > We have code in the soon-to-be-released 0.6.0 version of BLCR that takes > care of checkointing of open-but-deleted files. That code can easily be > leveraged to checkpoint all open files, whether or not they are deleted. > The interesting part comes at restart time when you need to determine > whether to use the checkpointed copy of a file or the copy that now > exists on disk. Depending on how a given application uses files (and > how users of the application expect to use the files after the > application runs) there is no single correct policy. The implementation > work to be done here is certainly simpler than socket checkpointing. > >> 4. What would be a good way to go about reading/modifying the code , >> since >> there is no manual avaiable ? > > I am afraid we don't have a good answer for this one. We try to put > comments in the kernel code that are sufficient for our own use when we > look at code that another member of our group has written, or our code > long after it was written. However, it will take a good bit of time to > learn the code just by reading it. Alas, there is no documentation > other than the code itself. > >> >> We'll be very grateful to hear from you. > > Feel free to ask more questions if you need to. > > >> Thank you, >> >> Abhinav Jha & Manish Kumar, >> Indian Institute of Technology Guwahati >> Guwahati -39, INDIA >> http://www.iitg.ernet.in > > > -Paul > > -- > Paul H. Hargrove PHHargrove_at_lbl_dot_gov > Future Technologies Group > HPC Research Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > -- Abhinav Jha Indian Institute of Technology Guwahati