From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Sun May 17 2009 - 11:30:32 PDT
There are certain functions in libcr that are for use of a "target": a process that may be checkpointed. These all use the one "local_fd" in syscall.c, which is opened the first time one of these functions is called, and then reused as it is needed to avoid open/close on every call. You ask "Why can not we use only cri_syscall_token" - the answer is that we do use a single __cri_syscall_token(), but cri_syscall() is wrapper that manages this single reused fd and errno, while cri_syscall_token() manages only errno. The "thread access" code is to ensure that we get exactly one fd even if multiple threads call at the same time. The code might look more natural if we had used pthread mutexes, but we cannot for the same reason we cannot make certain syscalls through the normal paths: because we may need to do this when the pthread environment is not available. There are also functions for use inside the callback code that runs when a checkpoint it taken. This includes cr_checkpoint() and abort_checkpoint(). For these, the fd passed to cri_syscall_token() must be a specific one that the kernel knows is associated with the *specific* checkpoint in-progress request. This one is passed from the kernel to libcr when the signal handler was invoked, passed in in siginfo->si_pid in libcr/cr_core.c:cri_sig_handler(). Finally, there is a third group of calls including cr_request_checkpoint() and cr_request_restart() that open an fd that is used for all operations for that request. These also use cri_syscall_token(). I know I didn't address your questions in order, but I think I've explained what you wanted to know. If you still need help, let us know. -Paul ����� wrote: > Hello, Professor: > > Thank you very much for the previous answer. > > I have a question about issuing checkpoint request. when I am reading > "/util/cr_checkpoint.c" The code: > > /* issue the request */ > err = cr_request_checkpoint(&cr_args, &cr_handle); > > This is how BLCR issue a checkpoint request, I find the > function"cr_request_checkpoint" final calls > "cri_syscall_token(*handle, CR_OP_CHKPT_REQ, (uintptr_t)&req)" and the > first argument is actually a file descriptor which opened in > "/proc/checkpoint/ctrl". > > But at the same time , there is another function"cri_syscall()", The > difference between this one and"cri_syscall_token()" is this one have > not to accept a "fd" as an argument. however, it calls > "__cri_ioctl((int)cri_atomic_read(&local_fd), op, (void *)arg, errno_p);" > > My question is about the local variable "local_fd", I see it in the > "/libcr/syscall.c". I find some other function in this file use it to > control the "thread access". But I still do not know it's other usage > here. > > Q2: > Many function like"abort > _checkpoint","cr_checkpoint","cr_forword_checkpoint" finally calls > "cri_syscall()" Which "fd" are they exactly using? the same as the fd > opened in "/proc/checkpoint/ctrl"?? > > Q3: > Why can not we use only "cri_syscall_token" ??? > > > > > =============================================== > ��������һ������TOM�������ɣ���������1.5G������ʲô�� > <http://bjcgi.163.net/cgi-bin/newreg.cgi?%0Arf=050602> > =============================================== > -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group Tel: +1-510-495-2352 HPC Research Department Fax: +1-510-486-6900 Lawrence Berkeley National Laboratory