From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Mar 30 2005 - 12:58:05 PST
Richard, I don't have a certain answer, but I can guess. I suspect that when you see the hang BLCR is trying to retire the first checkpoint before starting the second. When restarted from the 1st checkpoint the user space part of BLCR believes that there is a previous checkpoint to retire, but the kernel disagrees. I've entering a bug report at http://mantis.lbl.gov/bugzilla/show_bug.cgi?id=1037 -Paul Richard Hu wrote: > To Whom It May Concern: > > I appear to be having an issue with multiple checkpoints in BLCR and I > was wondering if you could perhaps shed some light on the problem. I > have attached a simple test program to demonstrate my problem. > Essentially when I run the program, two checkpoints are generated with > some activity happening between the checkpoints. When I restart from > the second checkpoint (for_loop_1), everything works. When I restart > from the first checkpoint (for_loop_1), the program hangs when it hits > the spot in the program where it attempts to create the second > checkpoint. Do you know why this happens? Is there a possible > work-around? > > Thanks, > Richard Hu > rhu_at_opnet_dot_com > >------------------------------------------------------------------------ > >#include <stdio.h> >#include <stdlib.h> >#include "libcr.h" >#include <math.h> >#include <string.h> > >int callback(void *arg); > >int main (void) { > int counter; > char path[100] = "/usr/local/for_loop_"; > char num[20]; > > cr_init(); > cr_register_callback(callback, NULL, CR_THREAD_CONTEXT); > counter = 0; > > for (counter = 0; counter < 20; counter++) > printf("I am number %i\n", counter); > > cr_request_file ("/usr/local/for_loop_0"); > > for (counter = 40; counter < 60; counter++) > printf("I am number %i\n", counter); > > cr_request_file ("/usr/local/for_loop_1"); > > return 0; >} > > >int callback (void* arg) { > int rc; > > rc = cr_checkpoint(CR_CHECKPOINT_READY); > if (rc) { > printf("We have been restarted\n"); > } > else { > printf("Dump generated. We are continuing\n"); > } > return 0; >} > >