From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Thu Mar 05 2009 - 15:51:08 PST
Karthik, The way we have envisioned passing arguments to a checkpoint callback is through what is often called a "client data pointer" or "context pointer". The basic idea is that when you register the callback you pass the registration function a (void*) that points to a callback-dependent data structure. Your application can use that data structure in any way you please to pass arguments. This is, of course, designed for passing of argument within the address space of the application, which is not what you are asking about. What you are asking for is something we have not considered: the passing of arguments from the checkpoint *requester* to the checkpoint *target*. While the mechanism you describe does sound useful for some cases, I am concerned that there are complications that arice in the general case. In particular, I am thinking of any situation in which there are multiple "clients" of libcr in the application's address space, such as the application code itself plus one of more BLCR-enabled libraries (MPI comes to mind). In such a case the question arises of which arguments are for which callbacks. I think that your use of a file is a reasonable solution, even if it doesn't seem too elegant. You might consider something slightly fancier like opening a FIFO or socket to a "server" process that provides the arguments. To be honest, if I were to implement something like what you suggest in cr_checkpoint, it would likely be implemented in that manner: using a socket or FIFO connection between the cr_checkpoint program and the libcr code linked into the target application. You can, of course, do exactly that on your own by creating a "my_checkpoint" wrapper around cr_checkpoint to handle the argument parsing and the connection, and you callback would contain (or call) the code implementing the other end of the connection. A potential key to making this work in the presence of multiplt checkpoints is the fact that the requester and target know each others IDs (target can call cr_get_checkpoint_info() to find the requester's pid). While your suggestion is potentially useful, I know I don't have time to implement something like this any time soon. If you like, you could create an entry in our Bugzilla database (http://mantis.lbl.gov/bugzilla) to request this feature. Also, if you do implement something that you'd be willing to share, I might add it to the examples directory in the BLCR distribution for others to use if it is useful to them. -Paul Karthik Gopalakrishnan wrote: > Hi Paul. > > I see that the CR Callback function can accept one void * argument. > However, I don't see a 'proper' way to pass data to my application's > callback function when I do a 'cr_checkpoint'. It will be nice if I > could do a 'cr_checkpoint [options] ID arg1 arg2 ... argN' with > arg[1..N] being passed to the registered callbacks, maybe as 'char > **args' & args[N] = NULL. The definition of the callback could be > changed to "typedef int (*cr_callback_t)(char **, void *)". I admit I > have not thought this through, but I feel something like this will be > pretty useful. > > Currently, I have a wrapper that writes the parameters to some tmp > file before calling cr_checkpoint and I get my callback function to > read the arguments off that file. I'll be grateful if you could > suggest a better way for me to achieve this, in the current scenario. > > Thanks & Regards, > Karthik > -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group Tel: +1-510-495-2352 HPC Research Department Fax: +1-510-486-6900 Lawrence Berkeley National Laboratory