Re: question about "cr_save_mmaps_data" function

From: Tao Ke (cartoon.ke_at_gmail_dot_com)
Date: Thu Mar 25 2010 - 22:34:06 PDT

  • Next message: Balazs Gerofi: "Re: question about "cr_save_mmaps_data" function"
    Thank you so much for your patient and detailed explanation. It is very
    helpful to me.
    I have tracked the call path from the beginning. And it seems to me that the
    context of the process if saved inside cr_save_mmaps_data, and the
    checkpoint looks end here. I am confused here about when vmadump4 is used.
    From you explanation, vmadump can be used to handle a single thread. I found
    that blcr module initialize vmadump module, but I can not find when the
    vmadump is used else where. Could you please give me some hints about when
    vmadump module is used?
    Thank you again for you time.
    
    On Thu, Mar 25, 2010 at 10:58 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov>wrote:
    
    > TK,
    >
    > It is not my intent to be rude or condescending but I don't have the time
    > to describe everything that takes place in a checkpoint.
    > The simple answer is that "the whole story" is in the source code - which
    > you have available to examine.
    >
    > You have correctly determined that a checkpoint begins with an ioctl() that
    > invokes cr_dump_self(), and you should be able to trace the rest using the
    > source code.  I have not memorized which functions call which others in what
    > order, even though I wrote most of it.  To give you the "whole story" I
    > would have to take the time to read through the sources and trace the calls.
    >  Instead, I encourage you to read them.  Doing so is likely to give you a
    > deeper understanding than if I were to try to do it for you.  If after that
    > you have some specific questions about "how" or "why" things are done, I may
    > be able to help.
    >
    > You may want to look at tools like "cflow" to build a call graph for you,
    > though I cannot be certain they work well with Linux kernel code.
    >
    > I CAN summarize the distinction between the code in cr_module/ and
    > vmadump4/, which appears to be a significant point of your question.   The
    > vmadump code is a heavily modified version of software from the BProc
    > project that predates BLCD (and comes from a different organization).  It
    > was never able to deal with shared memory, files or multiple processes; nor
    > does it have the callback mechanisms of BLCR.  So the BLCR project began
    > with the intent of keeping the changes made to files in vmadump to a minimum
    > and building the other functionality (e.g. shared memory, files and multiple
    > processes) separately.  That is why you will find that vmadump handles
    > "anonymous" pages and non-shared mappings, while the cr_save_mmaps code
    > handles the shared mappings.
    >
    > I hope my answer helps you some, even if I can't provide the answer you may
    > have been looking for.
    > -Paul
    >
    >
    > TK wrote:
    >
    >> Thanks.
    >> But when a checkpoint request is issued with "cr_checkpoint" command, a
    >>  ioctl request is made to /proc/checkpoint/ctrl. I suppose it will be  the
    >> "CR_OP_HAND_CHKPT" request. Then "cr_dump_self"  will be called, and finally
    >> cr_save_mmaps_data will be called, and the memory will be saved here. Am I
    >> correct? If so, when is  the whole story of checkpoint? When the "vmadump"
    >> module is used then ?
    >>
    >> Thank you very much.
    >>
    >> On 03/25/2010 07:20 PM, Paul H. Hargrove wrote:
    >>
    >>> TK,
    >>>
    >>> I am sorry I didn't get the chance to answer this one when you asked me
    >>> directly 2 days ago - I am up against some deadlines right now.
    >>>
    >>> To answer your question:
    >>> In the function you ask about we are dealing only with memory regions
    >>> created by mmap() of a file.  Therefore all the "clean" pages already exist
    >>> somewhere on disk in the file that has been mmap()ed.  This includes the
    >>> executable file and shared libraries that were mmap()ed in prior to the
    >>> start of main().  As with open files, BLCR makes the (optimistic) assumption
    >>> that the file will still exist, unmodified, at the time of the restart.
    >>>  However, one can ensure that even the "clean" pages will be stored with the
    >>> checkpoint by passing --save-all.
    >>>
    >>> -Paul
    >>>
    >>> TK wrote:
    >>>
    >>>> Hi , all. I am trying to adding my own code into BLCR for some
    >>>> experiments.
    >>>> When I was reading the code of "cr_save_mmaps_data" function in
    >>>> cr_module/cr_mmaps.c, I found the comment /* dump the dirty pages */ . I am
    >>>> wondering you dump only the dirty pages only? It will not be enough info for
    >>>> restart. Or the other pages are dumped else where? If so, where is it?
    >>>> Thank you.
    >>>>
    >>>
    >>>
    >>>
    >>
    >
    > --
    > Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    > Future Technologies Group                 Tel: +1-510-495-2352
    > HPC Research Department                   Fax: +1-510-486-6900
    > Lawrence Berkeley National Laboratory
    >
    

  • Next message: Balazs Gerofi: "Re: question about "cr_save_mmaps_data" function"