From: Parviz Fariborz (parviz_fariborz_at_mentor_dot_com)
Date: Tue May 27 2008 - 04:03:10 PDT
Hi Paul, Thanks for taking the time to explain this. I will try your suggestion and will be glad to put together a how-to guide once I verify that it works. -Parviz Paul H. Hargrove wrote: > Parviz, > > BLCR is not able to save/restore the association between the debugger > and the executable, making what you are trying slightly difficult (but > hopefully not impossible). For that reason, in the 0.7.0 release (due > out soon) the default behavior will be to refuse to checkpoint while a > debugger is attached (an additional option will need to be specified > to allow the checkpoint in such a case). In neither the 0.6.x or > 0.7.0 release will checkpointing gdb and the debugged process together > (as process group, process tree, etc) work. If it did, your task > would have been much easier (just "cr_checkpoint <pid-of-gdb>"). > > The Trace/BPT trap you see is the restarted executable executing a > breakpoint (bpt) trap instruction that the debugger inserted. Since > at restart time no debugger is attached, the trap is a fatal error. > The problem is that any breakpoint trap instruction written by the > first gdb is still present in the checkpointed process, having > replaced instuction(s) in the process. When gdb wrote that > instruction into process memory, it would have saved the original > instruction byte in its own memory (to restore when executing past the > breakpoint, or when removing it). However that information was lost > when the first gdb exited. This doesn't appear to have a good > solution other than deleting all breakpoints before you take the > checkpoint. If you consult a gdb expert (I am not one) you may be > able to get gdb to print all the breakpoint data in a form that can be > fed back into the new gdb (or perhaps you only have one at this > stage). So, I recommend the following steps: > 1) Run under control on gdb until it stops at your "safe" breakpoint > 2) delete all breakpoints/watchpoints > 3) checkpoint the process (may require you to "c" in response to the > BLCR-generated signal) > > At restart time there is the question of attaching gdb "soon enough" > to regain control before the buggy code runs. Since we had to remove > all the breakpoints, there seems to be nothing preventing the code > from executing normally, bugs and all. If you are restarting from a > point early enough (say 1 minute or more) before your suspected bug > then you can probably just restart and then attach gdb "fast enough". > If you are too slow it costs you little to try again. However, it > might not be possible to do that in general. To deal with that on can > try passing "--stop" to the cr_restart command, which will freeze the > executable (with a SIGSTOP) immediately on restart (before returning > control to the point where BLCR interrupted execution). That should > allow you to attach a debugger, which then may need to send SIGCONT to > the process to resume execution. However, I am not sure that gdb will > correctly attach to a STOPed process. In my experiments there were > some cases where "gdb <exectuable> <pid>" appeared to hang when the > process was STOPed in this manner. If so, try sending a SIGCONT from > another window/terminal ("kill -CONT <pid>"); hopefully that will > resolve it, but it didn't always do so for me. I think this depends > on the gdb and/or kernel release. In short, my recommendation if > "attach gdb fast enough" isn't possible is: > 1) Restart with the "--stop" command line option to freeze the process > 2) Attach gdb to the restarted-but-stopped process > 3) Send SIGCONT, either from gdb (if it attached OK) or from a command > line (if gdb looks "stuck"). > > Hope this helps. Let us know if the instructions above do or do not > work for you. Perhaps you'd be interested in helping to write up a > "mini howto" based on your experiences? > > -Paul > > Parviz Fariborz wrote: >> >> Hi, >> >> I am trying to use blcr to shorten the debug time for a large >> executable. I have described the approach that I have taken and the >> issues that I ran into below. Perhaps someone in this mailing list >> has done the same and can give me some guidance. >> >> When debugging a long running executable in gdb (multiple hours), I >> want to use blcr to checkpoint the running executable at a breakpoint >> close to the problem area where I can safely assume things are in >> good state. In the next round of debugging, instead of running the >> executable in gdb, I want to re-start the checkpoint and attach the >> gdb to running process. This gets me to the point of interest a lot >> faster. >> >> My questions are : Is it possible to stop a running process in gdb at >> a breakpoint and create a checkpoint? I tried it and was able to >> create the checkpoint file, But the re-start always failed with the >> following message : >> >> .Trace/BPT trap >> >> Also, is there a better approach? If so, please describe it. >> >> Thanks in advance for your help >> >> -Parviz > >