Asynchronous checkpointing support in BLCR

From: Rajagopal Natarajan (rajagopal.n_at_gmail_dot_com)
Date: Tue Feb 27 2007 - 05:15:59 PST

  • Next message: Josh Hursey: "Re: Asynchronous checkpointing support in BLCR"
    Hi,
    
    I'm working on a 10 node P3 cluster, and use BLCR on it. I would like to
    know if BLCR has any existing support for asynchronous checkpointing.
    
    If the answer is yes, please point me to the appropriate docs.
    
    If the answer is no, I would like to implement asynchronous checkpointing in
    LAM-MPI. Please tell me if i can make use of BLCR and modify the code to do
    that, and how much of code might need to be modified. Would it be feasible
    to implement it in 1-1.5 months, with two developers working part time on
    it(Myself and my classmate, who both are working on our bachelors thesis on
    checkpointing in LAM-MPI based clusters and avoidance of rollback
    propagation. As we have other course work, we might be able to devote upto
    4-5 hrs on this project).
    
    If the above project is not feasible in the specified time of 1-1.5 months
    with 2 developers working on it, suggest us a something that we can
    contribute to BLCR which would be related to avoidance rollback propagation.
    
    Thanks.
    
    -- 
    N. Rajagopal,
    Visit me at http://users.kaski-net.net/~raj/
    

  • Next message: Josh Hursey: "Re: Asynchronous checkpointing support in BLCR"