From: Balazs Gerofi (bgerofi_at_il.is.s.u-tokyo.ac.jp)
Date: Thu Mar 25 2010 - 23:08:41 PDT
Hi Tao, if you go to cr_dump_self() you will see a call to cr_do_dump() after the leader thread is chosen. cr_do_dump() calls cr_do_vmadump() which calls a couple of vmadump functions. cr_save_mmaps_maps() is the one where non-shared mappings are dumped. I recommend you to use cscope or any other source code tagging package, so that you can easily follow the call stack. Regards, Balazs On Fri, Mar 26, 2010 at 2:34 PM, Tao Ke <cartoon.ke_at_gmail_dot_com> wrote: > Thank you so much for your patient and detailed explanation. It is very > helpful to me. > I have tracked the call path from the beginning. And it seems to me that > the context of the process if saved inside cr_save_mmaps_data, and the > checkpoint looks end here. I am confused here about when vmadump4 is used. > From you explanation, vmadump can be used to handle a single thread. I found > that blcr module initialize vmadump module, but I can not find when the > vmadump is used else where. Could you please give me some hints about when > vmadump module is used? > Thank you again for you time. > > On Thu, Mar 25, 2010 at 10:58 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov>wrote: > >> TK, >> >> It is not my intent to be rude or condescending but I don't have the time >> to describe everything that takes place in a checkpoint. >> The simple answer is that "the whole story" is in the source code - which >> you have available to examine. >> >> You have correctly determined that a checkpoint begins with an ioctl() >> that invokes cr_dump_self(), and you should be able to trace the rest using >> the source code. I have not memorized which functions call which others in >> what order, even though I wrote most of it. To give you the "whole story" I >> would have to take the time to read through the sources and trace the calls. >> Instead, I encourage you to read them. Doing so is likely to give you a >> deeper understanding than if I were to try to do it for you. If after that >> you have some specific questions about "how" or "why" things are done, I may >> be able to help. >> >> You may want to look at tools like "cflow" to build a call graph for you, >> though I cannot be certain they work well with Linux kernel code. >> >> I CAN summarize the distinction between the code in cr_module/ and >> vmadump4/, which appears to be a significant point of your question. The >> vmadump code is a heavily modified version of software from the BProc >> project that predates BLCD (and comes from a different organization). It >> was never able to deal with shared memory, files or multiple processes; nor >> does it have the callback mechanisms of BLCR. So the BLCR project began >> with the intent of keeping the changes made to files in vmadump to a minimum >> and building the other functionality (e.g. shared memory, files and multiple >> processes) separately. That is why you will find that vmadump handles >> "anonymous" pages and non-shared mappings, while the cr_save_mmaps code >> handles the shared mappings. >> >> I hope my answer helps you some, even if I can't provide the answer you >> may have been looking for. >> -Paul >> >> >> TK wrote: >> >>> Thanks. >>> But when a checkpoint request is issued with "cr_checkpoint" command, a >>> ioctl request is made to /proc/checkpoint/ctrl. I suppose it will be the >>> "CR_OP_HAND_CHKPT" request. Then "cr_dump_self" will be called, and finally >>> cr_save_mmaps_data will be called, and the memory will be saved here. Am I >>> correct? If so, when is the whole story of checkpoint? When the "vmadump" >>> module is used then ? >>> >>> Thank you very much. >>> >>> On 03/25/2010 07:20 PM, Paul H. Hargrove wrote: >>> >>>> TK, >>>> >>>> I am sorry I didn't get the chance to answer this one when you asked me >>>> directly 2 days ago - I am up against some deadlines right now. >>>> >>>> To answer your question: >>>> In the function you ask about we are dealing only with memory regions >>>> created by mmap() of a file. Therefore all the "clean" pages already exist >>>> somewhere on disk in the file that has been mmap()ed. This includes the >>>> executable file and shared libraries that were mmap()ed in prior to the >>>> start of main(). As with open files, BLCR makes the (optimistic) assumption >>>> that the file will still exist, unmodified, at the time of the restart. >>>> However, one can ensure that even the "clean" pages will be stored with the >>>> checkpoint by passing --save-all. >>>> >>>> -Paul >>>> >>>> TK wrote: >>>> >>>>> Hi , all. I am trying to adding my own code into BLCR for some >>>>> experiments. >>>>> When I was reading the code of "cr_save_mmaps_data" function in >>>>> cr_module/cr_mmaps.c, I found the comment /* dump the dirty pages */ . I am >>>>> wondering you dump only the dirty pages only? It will not be enough info for >>>>> restart. Or the other pages are dumped else where? If so, where is it? >>>>> Thank you. >>>>> >>>> >>>> >>>> >>> >> >> -- >> Paul H. Hargrove PHHargrove_at_lbl_dot_gov >> Future Technologies Group Tel: +1-510-495-2352 >> HPC Research Department Fax: +1-510-486-6900 >> Lawrence Berkeley National Laboratory >> > >