/proc/checkpoint/ctrl limit?

From: Leonardo Fialho (leonardofialho_at_gmail_dot_com)
Date: Fri Dec 11 2009 - 10:50:31 PST

  • Next message: Paul H. Hargrove: "Re: /proc/checkpoint/ctrl limit?"
    I really don't know if it is a bug or whatever, but I'll describe i short words the problem.
    I did a small application which creates two threads, one or checkpointing and another to insert faults. The main code forks a matrix multiplication program which is the target of both threads.
    My first approach was made using cr_run, cr_checkpoint and ch_restart utilities (forked by threads), after *some faults* and restarts the application simply hangs. The ps shows the cr_restart as a defunct program only.
    I changed my application to use the BLCR API. The problems persists. So, using lsof I saw that I did a mistake during the recovery. Before each cr_request_restart I have used a cr_init. It means that after 500 restarts I had 500 /proc/checkpoint/ctrl opened connections. And after some amount of connections (1024?) the applications hangs again. I changed my code and it, now, appears to run quite well.
    My questions is: using cr_restart forked by the main application, the cr_init called by the forked process still opened along the process lifecycle? If it occurs, it is a big problem for long time running applications.
    Leonardo Fialho

  • Next message: Paul H. Hargrove: "Re: /proc/checkpoint/ctrl limit?"