Re: Passing parameters to the CR Callback function.

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Thu Mar 05 2009 - 15:51:08 PST

  • Next message: Karthik Gopalakrishnan: "Re: Passing parameters to the CR Callback function."
    Karthik,
    
      The way we have envisioned passing arguments to a checkpoint callback 
    is through what is often called a "client data pointer" or "context 
    pointer".  The basic idea is that when you register the callback you 
    pass the registration function a (void*) that points to a 
    callback-dependent data structure.  Your application can use that data 
    structure in any way you please to pass arguments.  This is, of course, 
    designed for passing of argument within the address space of the 
    application, which is not what you are asking about.
    
      What you are asking for is something we have not considered: the 
    passing of arguments from the checkpoint *requester* to the checkpoint 
    *target*.  While the mechanism you describe does sound useful for some 
    cases, I am concerned that there are complications that arice in the 
    general case.  In particular, I am thinking of any situation in which 
    there are multiple "clients" of libcr in the application's address 
    space, such as the application code itself plus one of more BLCR-enabled 
    libraries (MPI comes to mind).  In such a case the question arises of 
    which arguments are for which callbacks.
    
      I think that your use of a file is a reasonable solution, even if it 
    doesn't seem too elegant.  You might consider something slightly fancier 
    like opening  a FIFO or socket to a "server" process that provides the 
    arguments.  To be honest, if I were to implement something like what you 
    suggest in cr_checkpoint, it would likely be implemented in that manner: 
    using a socket or FIFO connection between the cr_checkpoint program and 
    the libcr code linked into the target application.  You can, of course, 
    do exactly that on your own by creating a "my_checkpoint" wrapper around 
    cr_checkpoint to handle the argument parsing and the connection, and you 
    callback would contain (or call) the code implementing the other end of 
    the connection.  A potential key to making this work in the presence of 
    multiplt checkpoints is the fact that the requester and target know each 
    others IDs (target can call cr_get_checkpoint_info() to find the 
    requester's pid).
    
      While your suggestion is potentially useful, I know I don't have time 
    to implement something like this any time soon.  If you like, you could 
    create an entry in our Bugzilla database 
    (http://mantis.lbl.gov/bugzilla) to request this feature.  Also, if you 
    do implement something that you'd be willing to share, I might add it to 
    the examples directory in the BLCR distribution for others to use if it 
    is useful to them.
    
    -Paul
    
    Karthik Gopalakrishnan wrote:
    > Hi Paul.
    >
    > I see that the CR Callback function can accept one  void * argument.
    > However, I don't see a 'proper' way to pass data to my application's
    > callback function when I do a 'cr_checkpoint'. It will be nice if I
    > could do a 'cr_checkpoint [options] ID arg1 arg2 ... argN' with
    > arg[1..N] being passed to the registered callbacks, maybe as 'char
    > **args' & args[N] = NULL. The definition of the callback could be
    > changed to "typedef int (*cr_callback_t)(char **, void *)". I admit I
    > have not thought this through, but I feel something like this will be
    > pretty useful.
    >
    > Currently, I have a wrapper that writes the parameters to some tmp
    > file before calling cr_checkpoint and I get my callback function to
    > read the arguments off that file. I'll be grateful if you could
    > suggest a better way for me to achieve this, in the current scenario.
    >
    > Thanks & Regards,
    > Karthik
    >   
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     
    

  • Next message: Karthik Gopalakrishnan: "Re: Passing parameters to the CR Callback function."