Re: Extending BLCR

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Aug 29 2007 - 15:39:32 PDT

  • Next message: Nick Couchman: "Re: Extending BLCR"
    Abhinav Jha wrote:
    > Dear Sir,
    > We're final year students from Indian Institute of Technology, Guwahati (
    > ), working on our B.Tech. project,
    > "Implementation of checkpoint and restart mechanism on the linux kernel
    > 2.6".
    Thank you for your interest in BLCR.  You will find my answers to your 
    questions below.
    > We wanted to make use of the already existing facilities of BLCR in this
    > regard. However, we're not aware of a few things:
    > 1. Whether we can change your code without violating your copyright.
    BLCR is distributed under 2 Open Source Software licenses, the GPL and 
    LGPL.  You should examine the license.txt files in each directory for 
    information on which license applies to the files in that directory.
    The GPL allows you to modify the covered portions of BLCR provided that 
    you distribute your modified version under the same GPL license.
    The LGPL allows slightly more freedom in how you may use the covered 
    portions of BLCR.
    In either case, you should not have any problems if this is only for a 
    class project.  If you plan to distribute the resulting enhancements to 
    the general public, you should expect to simply apply the same licenses 
    to the modified versions.  You don't need to obtain any permissions from 
    us to do so.  However, if you do develop enhancements of general 
    interest, we should talk about incorporating your changes back into the 
    base BLCR code.
    > 2. What is the feasibility of implementing socket checkpointing in BLCR.
    Good question.  We have not tried to pursue this task ourselves, and 
    therefore have not tried hard to determine the exact level of 
    difficulty.  Assuming you are interested only in Unix-domain (aka 
    AF_LOCAL) sockets, I imagine the problems are small since the buffered 
    data is all local to one node.  In the case of TCP, you can probably get 
    away with preserving only the data that is buffered locally (both 
    incoming and outgoing) and counting on retransmission to recover any 
    data "on the wire" at checkpoint time.  The difficulty, however, is 
    likely to come from getting the TCP state engine back to the right 
    state.  For UDP, you can probably do the same as TCP.
    If you also want to attempt migration of TCP or UDP sockets, then you 
    will need some way to "adjust" the peer as well.
    > 3. Can we do an implementation of file checkpointing, that is independent
    > of the one you have planned ?
    We have code in the soon-to-be-released 0.6.0 version of BLCR that takes 
    care of checkointing of open-but-deleted files.  That code can easily be 
    leveraged to checkpoint all open files, whether or not they are deleted. 
       The interesting part comes at restart time when you need to determine 
    whether to use the checkpointed copy of a file or the copy that now 
    exists on disk.  Depending on how a given application uses files (and 
    how users of the application expect to use the files after the 
    application runs) there is no single correct policy.  The implementation 
    work to be done here is certainly simpler than socket checkpointing.
    > 4. What would be a good way to go about reading/modifying the code , since
    > there is no manual avaiable ?
    I am afraid we don't have a good answer for this one.  We try to put 
    comments in the kernel code that are sufficient for our own use when we 
    look at code that another member of our group has written, or our code 
    long after it was written.  However, it will take a good bit of time to 
    learn the code just by reading it.  Alas, there is no documentation 
    other than the code itself.
    > We'll be very grateful to hear from you.
    Feel free to ask more questions if you need to.
    > Thank you,
    > Abhinav Jha & Manish Kumar,
    > Indian Institute of Technology Guwahati
    > Guwahati -39, INDIA
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

  • Next message: Nick Couchman: "Re: Extending BLCR"