Re: Extending BLCR

From: Abhinav Jha (a.jha_at_iitg.ernet.in)
Date: Sat Sep 01 2007 - 06:05:09 PDT

  • Next message: Paul H. Hargrove: "Re: Extending BLCR"
    Dear Sir,
    
    Thank you for your kind reply. For the last few days, we have been going
    through the BLCR code and are trying to figure out how a process is
    checkpointed by BLCR. Is there a platform/forum where we could discuss
    BLCR? We will try our best to club together our doubts in future so that
    we don't cause you too much trouble ( we hope ).
    
    Thanks once again,
    
    Abhinav Jha & Manish Kumar,
    Indian Institute of Technology Guwahati
    Guwahati -39, INDIA
    http://www.iitg.ernet.in
    
    
    
    > Abhinav Jha wrote:
    >> Dear Sir,
    >>
    >> We're final year students from Indian Institute of Technology, Guwahati
    >> (
    >> http://www.iitg.ernet.in ), working on our B.Tech. project,
    >> "Implementation of checkpoint and restart mechanism on the linux kernel
    >> 2.6".
    >
    > Thank you for your interest in BLCR.  You will find my answers to your
    > questions below.
    >
    >>
    >> We wanted to make use of the already existing facilities of BLCR in this
    >> regard. However, we're not aware of a few things:
    >>
    >> 1. Whether we can change your code without violating your copyright.
    >
    > BLCR is distributed under 2 Open Source Software licenses, the GPL and
    > LGPL.  You should examine the license.txt files in each directory for
    > information on which license applies to the files in that directory.
    >
    > The GPL allows you to modify the covered portions of BLCR provided that
    > you distribute your modified version under the same GPL license.
    >
    > The LGPL allows slightly more freedom in how you may use the covered
    > portions of BLCR.
    >
    > In either case, you should not have any problems if this is only for a
    > class project.  If you plan to distribute the resulting enhancements to
    > the general public, you should expect to simply apply the same licenses
    > to the modified versions.  You don't need to obtain any permissions from
    > us to do so.  However, if you do develop enhancements of general
    > interest, we should talk about incorporating your changes back into the
    > base BLCR code.
    >
    >> 2. What is the feasibility of implementing socket checkpointing in BLCR.
    >
    > Good question.  We have not tried to pursue this task ourselves, and
    > therefore have not tried hard to determine the exact level of
    > difficulty.  Assuming you are interested only in Unix-domain (aka
    > AF_LOCAL) sockets, I imagine the problems are small since the buffered
    > data is all local to one node.  In the case of TCP, you can probably get
    > away with preserving only the data that is buffered locally (both
    > incoming and outgoing) and counting on retransmission to recover any
    > data "on the wire" at checkpoint time.  The difficulty, however, is
    > likely to come from getting the TCP state engine back to the right
    > state.  For UDP, you can probably do the same as TCP.
    >
    > If you also want to attempt migration of TCP or UDP sockets, then you
    > will need some way to "adjust" the peer as well.
    >
    >> 3. Can we do an implementation of file checkpointing, that is
    >> independent
    >> of the one you have planned ?
    >
    > We have code in the soon-to-be-released 0.6.0 version of BLCR that takes
    > care of checkointing of open-but-deleted files.  That code can easily be
    > leveraged to checkpoint all open files, whether or not they are deleted.
    >    The interesting part comes at restart time when you need to determine
    > whether to use the checkpointed copy of a file or the copy that now
    > exists on disk.  Depending on how a given application uses files (and
    > how users of the application expect to use the files after the
    > application runs) there is no single correct policy.  The implementation
    > work to be done here is certainly simpler than socket checkpointing.
    >
    >> 4. What would be a good way to go about reading/modifying the code ,
    >> since
    >> there is no manual avaiable ?
    >
    > I am afraid we don't have a good answer for this one.  We try to put
    > comments in the kernel code that are sufficient for our own use when we
    > look at code that another member of our group has written, or our code
    > long after it was written.  However, it will take a good bit of time to
    > learn the code just by reading it.  Alas, there is no documentation
    > other than the code itself.
    >
    >>
    >> We'll be very grateful to hear from you.
    >
    > Feel free to ask more questions if you need to.
    >
    >
    >> Thank you,
    >>
    >> Abhinav Jha & Manish Kumar,
    >> Indian Institute of Technology Guwahati
    >> Guwahati -39, INDIA
    >> http://www.iitg.ernet.in
    >
    >
    > -Paul
    >
    > --
    > Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    > Future Technologies Group
    > HPC Research Department                   Tel: +1-510-495-2352
    > Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    >
    
    
    -- 
    Abhinav Jha
    Indian Institute of Technology
    Guwahati
    

  • Next message: Paul H. Hargrove: "Re: Extending BLCR"