From: Karthik Gopalakrishnan (gopalakk_at_cse.ohio-state.edu)
Date: Wed Jan 28 2009 - 18:35:42 PST
Hello. I apologize for the long mail in advance. :-) I have an application which roughly works as follows: main() { do_cr_initialization(); do_real_work(); } do_real_work() { register(SIGCHLD_Handler); fork(); if (child) { do_stuff(); exit(0); } while(1); } SIGCHLD_Handler() { wait_for_child(); exit(0); } CR_Callback() { if (restarting) do_real_work() } do_stuff() is intelligent enough to continue from where it left off. Now, under normal execution, after the do_stuff() completes & exit(0) is called, SIGCHLD_Handler() is invoked which terminates the application. However, when cr_restart is called after a checkpoint, the application just "hangs" after do_stuff() completes the remaining work & calls exit(0). SIGCHLD_Handler() is not invoked at restart at all. The output of 'ps' shows the following: UID PID PPID C STIME TTY CMD gopalakk 11886 12020 0 20:30 pts/0 a.out gopalakk 12020 10333 0 20:30 pts/0 cr_restart context.11886 gopalakk 12026 11886 0 20:30 pts/0 [a.out] <defunct> Can someone explain what's going on here. Thanks & Regards, Karthik