Re: Using blcr for debugging

From: Parviz Fariborz (parviz_fariborz_at_mentor_dot_com)
Date: Tue May 27 2008 - 04:03:10 PDT

  • Next message: Paul H. Hargrove: "Announcing the release of BLCR 0.7.0"
    Hi Paul,
    
    Thanks for taking the time to explain this. I will try your suggestion 
    and will be glad to put together a how-to guide once I verify that it works.
    
    -Parviz
    
    Paul H. Hargrove wrote:
    > Parviz,
    >
    >  BLCR is not able to save/restore the association between the debugger 
    > and the executable, making what you are trying slightly difficult (but 
    > hopefully not impossible).  For that reason, in the 0.7.0 release (due 
    > out soon) the default behavior will be to refuse to checkpoint while a 
    > debugger is attached (an additional option will need to be specified 
    > to allow the checkpoint in such a case).  In neither the 0.6.x or 
    > 0.7.0 release will checkpointing gdb and the debugged process together 
    > (as process group, process tree, etc) work.  If it did, your task 
    > would have been much easier (just "cr_checkpoint <pid-of-gdb>").
    >
    >  The Trace/BPT trap you see is the restarted executable executing a 
    > breakpoint (bpt) trap instruction that the debugger inserted.  Since 
    > at restart time no debugger is attached, the trap is a fatal error.  
    > The problem is that any breakpoint trap instruction written by the 
    > first gdb is still present in the checkpointed process, having 
    > replaced instuction(s) in the process.  When gdb wrote that 
    > instruction into process memory, it would have saved the original 
    > instruction byte in its own memory (to restore when executing past the 
    > breakpoint, or when removing it).  However that information was lost 
    > when the first gdb exited.  This doesn't appear to have a good 
    > solution other than deleting all breakpoints before you take the 
    > checkpoint.  If you consult a gdb expert (I am not one) you may be 
    > able to get gdb to print all the breakpoint data in a form that can be 
    > fed back into the new gdb (or perhaps you only have one at this 
    > stage).  So, I recommend the following steps:
    > 1) Run under control on gdb until it stops at your "safe" breakpoint
    > 2) delete all breakpoints/watchpoints
    > 3) checkpoint the process (may require you to "c" in response to the 
    > BLCR-generated signal)
    >
    > At restart time there is the question of attaching gdb "soon enough" 
    > to regain control before the buggy code runs.  Since we had to remove 
    > all the breakpoints, there seems to be nothing preventing the code 
    > from executing normally, bugs and all.  If you are restarting from a 
    > point early enough (say 1 minute or more) before your suspected bug 
    > then you can probably just restart and then attach gdb "fast enough".  
    > If you are too slow it costs you little to try again.  However, it 
    > might not be possible to do that in general.  To deal with that on can 
    > try passing "--stop" to the cr_restart command, which will freeze the 
    > executable (with a SIGSTOP) immediately on restart (before returning 
    > control to the point where BLCR interrupted execution).  That should 
    > allow you to attach a debugger, which then may need to send SIGCONT to 
    > the process to resume execution.  However, I am not sure that gdb will 
    > correctly attach to a STOPed process.  In my experiments there were 
    > some cases where "gdb <exectuable> <pid>" appeared to hang when the 
    > process was STOPed in this manner.  If so, try sending a SIGCONT from 
    > another window/terminal ("kill -CONT <pid>"); hopefully that will 
    > resolve it, but it didn't always do so for me.  I think this depends 
    > on the gdb and/or kernel release.  In short, my recommendation if 
    > "attach gdb fast enough" isn't possible is:
    > 1) Restart with the "--stop" command line option to freeze the process
    > 2) Attach gdb to the restarted-but-stopped process
    > 3) Send SIGCONT, either from gdb (if it attached OK) or from a command 
    > line (if gdb looks "stuck").
    >
    > Hope this helps.  Let us know if the instructions above do or do not 
    > work for you.  Perhaps you'd be interested in helping to write up a 
    > "mini howto" based on your experiences?
    >
    > -Paul
    >
    > Parviz Fariborz wrote:
    >>
    >> Hi,
    >>
    >> I am trying to use blcr to shorten the debug time for a large 
    >> executable. I have described the approach that I have taken and the 
    >> issues that I ran into below. Perhaps someone in this mailing list 
    >> has done the same and can give me some guidance.
    >>
    >> When debugging a long running executable in gdb (multiple hours), I 
    >> want to use blcr to checkpoint the running executable at a breakpoint 
    >> close to the problem area where I can safely assume things are in 
    >> good state. In the next round of debugging, instead of running the 
    >> executable in gdb, I want to re-start the checkpoint and attach the 
    >> gdb to running process. This gets me to the point of interest a lot 
    >> faster.
    >>
    >> My questions are : Is it possible to stop a running process in gdb at 
    >> a breakpoint and create a checkpoint? I tried it and was able to 
    >> create the checkpoint file, But the re-start always failed with the 
    >> following message :
    >>
    >> .Trace/BPT trap
    >>
    >> Also, is there a better approach? If so, please describe it.
    >>
    >> Thanks in advance for your help
    >>
    >> -Parviz
    >
    >
    

  • Next message: Paul H. Hargrove: "Announcing the release of BLCR 0.7.0"