From: Rajagopal Natarajan (rajagopal.n_at_gmail_dot_com)
Date: Tue Feb 27 2007 - 05:15:59 PST
Hi, I'm working on a 10 node P3 cluster, and use BLCR on it. I would like to know if BLCR has any existing support for asynchronous checkpointing. If the answer is yes, please point me to the appropriate docs. If the answer is no, I would like to implement asynchronous checkpointing in LAM-MPI. Please tell me if i can make use of BLCR and modify the code to do that, and how much of code might need to be modified. Would it be feasible to implement it in 1-1.5 months, with two developers working part time on it(Myself and my classmate, who both are working on our bachelors thesis on checkpointing in LAM-MPI based clusters and avoidance of rollback propagation. As we have other course work, we might be able to devote upto 4-5 hrs on this project). If the above project is not feasible in the specified time of 1-1.5 months with 2 developers working on it, suggest us a something that we can contribute to BLCR which would be related to avoidance rollback propagation. Thanks. -- N. Rajagopal, Visit me at http://users.kaski-net.net/~raj/