Hang in cr_restart

From: Karthik Gopalakrishnan (gopalakk_at_cse.ohio-state.edu)
Date: Wed Jan 28 2009 - 18:35:42 PST

  • Next message: Paul H. Hargrove: "Re: Hang in cr_restart"
    Hello.
    
    I apologize for the long mail in advance. :-)
    
    I have an application which roughly works as follows:
    
    main()
    {
        do_cr_initialization();
        do_real_work();
     }
    
    do_real_work()
    {
       register(SIGCHLD_Handler);
       fork();
        if (child) {
            do_stuff();
            exit(0);
        }
        while(1);
    }
    
    SIGCHLD_Handler()
    {
        wait_for_child();
        exit(0);
    }
    
    CR_Callback()
    {
        if (restarting)
            do_real_work()
    }
    
    do_stuff() is intelligent enough to continue from where it left off.
    Now, under normal execution, after the do_stuff() completes & exit(0)
    is called, SIGCHLD_Handler() is invoked which terminates the
    application. However, when cr_restart is called after a checkpoint,
    the application just "hangs" after do_stuff() completes the remaining
    work & calls exit(0). SIGCHLD_Handler() is not invoked at restart at
    all. The output of 'ps' shows the following:
    
    UID        PID  PPID  C STIME TTY      CMD
    gopalakk 11886 12020  0 20:30 pts/0    a.out
    gopalakk 12020 10333  0 20:30 pts/0    cr_restart context.11886
    gopalakk 12026 11886  0 20:30 pts/0    [a.out] <defunct>
    
    Can someone explain what's going on here.
    
    Thanks & Regards,
    Karthik
    

  • Next message: Paul H. Hargrove: "Re: Hang in cr_restart"