Re: About the planned features of BLCR (post 0.4.0)

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Mar 23 2005 - 09:13:37 PST

  • Next message: 任明明: "Re: lam/mpi blcr problem"
       I am glad to here of your group's interest in using BLCR in your 
    work.  See my answers below.
    Teemu Koponen wrote:
    > BLCR developers,
    > We are developing process migration support for Host Identity Protocol 
    > [1] to our Linux 2.6 kernel implementation of it [2]. The process 
    > migration itself is not our focus, but the communication aspects related 
    > to migration are. Therefore, I'm checking the status of different 
    > process migration implementations for vanilla Linux kernel.
    > We intend to integrate our HIP based TCP/UDP communication migration 
    > support to the chosen migration implementation. Moreover, the migration 
    > implementation should support migration of network server applications 
    > consisting of multiple processes (communicating via shared memory/IPC). 
    > Therefore, I wonder what is the current status of the following planned 
    > items on your web page:
    > - #  Special device files such as /dev/null, /dev/zero and /dev/random
    No work has been done on this yet, because it has not yet been a serious 
    issue for any test applications.  However, this is just a matter of 
    writing the necessary lines of code and not of any difficult design 
    work.  This could be done soon if it is the only thing keeping BLCR from 
    meeting your needs.
    > - # Coherent checkpoints of process groups and sessions
    >     * Process group support allows checkpointing of command pipelines 
    > (e.g. grep foo bar | sort)
    >     * Sessions support eases integration with most batch systems and 
    > allows checkpointing of login shells
    I have a partial design for the process group and session support. 
    Completion of the design and the implementation are #3 on my priority 
    list right now. (see below)
    > - # Support for Linux 2.6
    At present version 0.4.0 on our website is well tested on only a single 
    2.6-based distribution.  We have reports from users of other 
    distrbutions that describe some of the compilation problems they have 
    encountered.  Supporting all common 2.6-based distributions is #2 on my 
    priority list right now.
    The #1 item on my list right now is getting a beta of x86_64 support 
    completed.  I can presently checkpoint and restart only single threaded 
    processes.  Restart of pthreaded processes works less than 1/2 the time 
    and the rest of the time faults upon return from kernel space to user 
    space, preventing any use of a debugger.  I had hoped to have the x86_64 
    support ready last month, but it is still not done.
    When you ask for the status I assume you wanted dates.  I am afraid that 
    I am unable to make accurate estimates, but I can tell you when I *hope* 
    to see each of these items done.
    #1: x86_64 support should either be releasable by the end of this month 
    or I will be forced to put it aside to work on the other items.
    #2: wider 2.6 support is something that I've done a little bit of work 
    on as bug reports have come in.  However, I expect to give it my full 
    attention once x86_64 support is done or put aside.  If you have 
    interest in a particular distribution, please let me know.  If you could 
    try 0.4.0 on your desired distribution and report any compilation 
    failures, that would be even better.
    #3: process groups and sessions support will come after I have stable 
    2.6 support with reasonably wide support.  I am targetting July for a 
    release that would include process group and sessions support.
    As I said above, the special device files are not a high priority for me 
    at the moment, but I can tackle it if needed.  If you are in a big hurry 
    for this let me know.
    > [1] -
    > [2] -
    > Thanks,
    > Teemu
    > -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

  • Next message: 任明明: "Re: lam/mpi blcr problem"