From: TK (cartoon.ke_at_gmail_dot_com)
Date: Fri Mar 26 2010 - 06:58:22 PDT
Thanks very much, Balazs. I looked into the call graph again, and a lot of calls to other functions in vmadump4 as well. Thank you very much for you help. Regards Tao Ke On 03/26/2010 01:40 AM, Balazs Gerofi wrote: > > Sorry, I was wrong :) > So the call stack is like this: > > cr_dump_self() -> cr_do_dump() -> cr_do_vmadump() -> > cr_freeze_threads() -> > vmadump_freeze_proc() -> and there if you are the leader then the > non-shared > mappings are also saved (in vmadump4/vmadump_common.c). > > Sorry again, > Balazs > > On Fri, Mar 26, 2010 at 3:08 PM, Balazs Gerofi > <[email protected] <mailto:[email protected]>> > wrote: > > > Hi Tao, > > if you go to cr_dump_self() you will see a call to cr_do_dump() > after the leader > thread is chosen. cr_do_dump() calls cr_do_vmadump() which calls a > couple > of vmadump functions. > cr_save_mmaps_maps() is the one where non-shared mappings are dumped. > > I recommend you to use cscope or any other source code tagging > package, > so that you can easily follow the call stack. > > Regards, > Balazs > > > On Fri, Mar 26, 2010 at 2:34 PM, Tao Ke <cartoon.ke > <http://cartoon.ke>_at_gmail_dot_com <http://gmail.com>> wrote: > > Thank you so much for your patient and detailed explanation. > It is very helpful to me. > I have tracked the call path from the beginning. And it seems > to me that the context of the process if saved inside > cr_save_mmaps_data, and the checkpoint looks end here. I am > confused here about when vmadump4 is used. From you > explanation, vmadump can be used to handle a single thread. I > found that blcr module initialize vmadump module, but I can > not find when the vmadump is used else where. Could you please > give me some hints about when vmadump module is used? > Thank you again for you time. > > On Thu, Mar 25, 2010 at 10:58 PM, Paul H. Hargrove > <PHHargrove_at_lbl_dot_gov <mailto:PHHargrove_at_lbl_dot_gov>> wrote: > > TK, > > It is not my intent to be rude or condescending but I > don't have the time to describe everything that takes > place in a checkpoint. > The simple answer is that "the whole story" is in the > source code - which you have available to examine. > > You have correctly determined that a checkpoint begins > with an ioctl() that invokes cr_dump_self(), and you > should be able to trace the rest using the source code. I > have not memorized which functions call which others in > what order, even though I wrote most of it. To give you > the "whole story" I would have to take the time to read > through the sources and trace the calls. Instead, I > encourage you to read them. Doing so is likely to give > you a deeper understanding than if I were to try to do it > for you. If after that you have some specific questions > about "how" or "why" things are done, I may be able to help. > > You may want to look at tools like "cflow" to build a call > graph for you, though I cannot be certain they work well > with Linux kernel code. > > I CAN summarize the distinction between the code in > cr_module/ and vmadump4/, which appears to be a > significant point of your question. The vmadump code is > a heavily modified version of software from the BProc > project that predates BLCD (and comes from a different > organization). It was never able to deal with shared > memory, files or multiple processes; nor does it have the > callback mechanisms of BLCR. So the BLCR project began > with the intent of keeping the changes made to files in > vmadump to a minimum and building the other functionality > (e.g. shared memory, files and multiple processes) > separately. That is why you will find that vmadump > handles "anonymous" pages and non-shared mappings, while > the cr_save_mmaps code handles the shared mappings. > > I hope my answer helps you some, even if I can't provide > the answer you may have been looking for. > -Paul > > > TK wrote: > > Thanks. > But when a checkpoint request is issued with > "cr_checkpoint" command, a ioctl request is made to > /proc/checkpoint/ctrl. I suppose it will be the > "CR_OP_HAND_CHKPT" request. Then "cr_dump_self" will > be called, and finally cr_save_mmaps_data will be > called, and the memory will be saved here. Am I > correct? If so, when is the whole story of > checkpoint? When the "vmadump" module is used then ? > > Thank you very much. > > On 03/25/2010 07:20 PM, Paul H. Hargrove wrote: > > TK, > > I am sorry I didn't get the chance to answer this > one when you asked me directly 2 days ago - I am > up against some deadlines right now. > > To answer your question: > In the function you ask about we are dealing only > with memory regions created by mmap() of a file. > Therefore all the "clean" pages already exist > somewhere on disk in the file that has been > mmap()ed. This includes the executable file and > shared libraries that were mmap()ed in prior to > the start of main(). As with open files, BLCR > makes the (optimistic) assumption that the file > will still exist, unmodified, at the time of the > restart. However, one can ensure that even the > "clean" pages will be stored with the checkpoint > by passing --save-all. > > -Paul > > TK wrote: > > Hi , all. I am trying to adding my own code > into BLCR for some experiments. > When I was reading the code of > "cr_save_mmaps_data" function in > cr_module/cr_mmaps.c, I found the comment /* > dump the dirty pages */ . I am wondering you > dump only the dirty pages only? It will not be > enough info for restart. Or the other pages > are dumped else where? If so, where is it? > Thank you. > > > > > > > -- > Paul H. Hargrove PHHargrove_at_lbl_dot_gov > <mailto:PHHargrove_at_lbl_dot_gov> > Future Technologies Group Tel: +1-510-495-2352 > HPC Research Department Fax: +1-510-486-6900 > Lawrence Berkeley National Laboratory > > > >