Re: Using blcr for debugging

Date view	Thread view	Subject view	Author view	Attachment view

From: Parviz Fariborz (parviz_fariborz_at_mentor_dot_com)
Date: Tue May 27 2008 - 04:03:10 PDT

Next message: Paul H. Hargrove: "Announcing the release of BLCR 0.7.0"

Previous message: Paul H. Hargrove: "Re: Using blcr for debugging"
In reply to: Paul H. Hargrove: "Re: Using blcr for debugging"

Hi Paul,

Thanks for taking the time to explain this. I will try your suggestion 
and will be glad to put together a how-to guide once I verify that it works.

-Parviz

Paul H. Hargrove wrote:
> Parviz,
>
>  BLCR is not able to save/restore the association between the debugger 
> and the executable, making what you are trying slightly difficult (but 
> hopefully not impossible).  For that reason, in the 0.7.0 release (due 
> out soon) the default behavior will be to refuse to checkpoint while a 
> debugger is attached (an additional option will need to be specified 
> to allow the checkpoint in such a case).  In neither the 0.6.x or 
> 0.7.0 release will checkpointing gdb and the debugged process together 
> (as process group, process tree, etc) work.  If it did, your task 
> would have been much easier (just "cr_checkpoint <pid-of-gdb>").
>
>  The Trace/BPT trap you see is the restarted executable executing a 
> breakpoint (bpt) trap instruction that the debugger inserted.  Since 
> at restart time no debugger is attached, the trap is a fatal error.  
> The problem is that any breakpoint trap instruction written by the 
> first gdb is still present in the checkpointed process, having 
> replaced instuction(s) in the process.  When gdb wrote that 
> instruction into process memory, it would have saved the original 
> instruction byte in its own memory (to restore when executing past the 
> breakpoint, or when removing it).  However that information was lost 
> when the first gdb exited.  This doesn't appear to have a good 
> solution other than deleting all breakpoints before you take the 
> checkpoint.  If you consult a gdb expert (I am not one) you may be 
> able to get gdb to print all the breakpoint data in a form that can be 
> fed back into the new gdb (or perhaps you only have one at this 
> stage).  So, I recommend the following steps:
> 1) Run under control on gdb until it stops at your "safe" breakpoint
> 2) delete all breakpoints/watchpoints
> 3) checkpoint the process (may require you to "c" in response to the 
> BLCR-generated signal)
>
> At restart time there is the question of attaching gdb "soon enough" 
> to regain control before the buggy code runs.  Since we had to remove 
> all the breakpoints, there seems to be nothing preventing the code 
> from executing normally, bugs and all.  If you are restarting from a 
> point early enough (say 1 minute or more) before your suspected bug 
> then you can probably just restart and then attach gdb "fast enough".  
> If you are too slow it costs you little to try again.  However, it 
> might not be possible to do that in general.  To deal with that on can 
> try passing "--stop" to the cr_restart command, which will freeze the 
> executable (with a SIGSTOP) immediately on restart (before returning 
> control to the point where BLCR interrupted execution).  That should 
> allow you to attach a debugger, which then may need to send SIGCONT to 
> the process to resume execution.  However, I am not sure that gdb will 
> correctly attach to a STOPed process.  In my experiments there were 
> some cases where "gdb <exectuable> <pid>" appeared to hang when the 
> process was STOPed in this manner.  If so, try sending a SIGCONT from 
> another window/terminal ("kill -CONT <pid>"); hopefully that will 
> resolve it, but it didn't always do so for me.  I think this depends 
> on the gdb and/or kernel release.  In short, my recommendation if 
> "attach gdb fast enough" isn't possible is:
> 1) Restart with the "--stop" command line option to freeze the process
> 2) Attach gdb to the restarted-but-stopped process
> 3) Send SIGCONT, either from gdb (if it attached OK) or from a command 
> line (if gdb looks "stuck").
>
> Hope this helps.  Let us know if the instructions above do or do not 
> work for you.  Perhaps you'd be interested in helping to write up a 
> "mini howto" based on your experiences?
>
> -Paul
>
> Parviz Fariborz wrote:
>>
>> Hi,
>>
>> I am trying to use blcr to shorten the debug time for a large 
>> executable. I have described the approach that I have taken and the 
>> issues that I ran into below. Perhaps someone in this mailing list 
>> has done the same and can give me some guidance.
>>
>> When debugging a long running executable in gdb (multiple hours), I 
>> want to use blcr to checkpoint the running executable at a breakpoint 
>> close to the problem area where I can safely assume things are in 
>> good state. In the next round of debugging, instead of running the 
>> executable in gdb, I want to re-start the checkpoint and attach the 
>> gdb to running process. This gets me to the point of interest a lot 
>> faster.
>>
>> My questions are : Is it possible to stop a running process in gdb at 
>> a breakpoint and create a checkpoint? I tried it and was able to 
>> create the checkpoint file, But the re-start always failed with the 
>> following message :
>>
>> .Trace/BPT trap
>>
>> Also, is there a better approach? If so, please describe it.
>>
>> Thanks in advance for your help
>>
>> -Parviz
>
>

Next message: Paul H. Hargrove: "Announcing the release of BLCR 0.7.0"

Previous message: Paul H. Hargrove: "Re: Using blcr for debugging"
In reply to: Paul H. Hargrove: "Re: Using blcr for debugging"

Date view	Thread view	Subject view	Author view	Attachment view