Re: Question about "fd" token

Date view	Thread view	Subject view	Author view	Attachment view

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Mon Jun 15 2009 - 13:37:25 PDT

Next message: ��: "Re: Re: Question about "fd" token"

Previous message: ��: "Re: Re: Question about "fd" token"
In reply to: ��: "Re: Re: Question about "fd" token"
Maybe reply: ��: "Re: Re: Question about "fd" token"
Maybe reply: ��: "Re: Re: Question about "fd" token"
Maybe reply: ��: "Re: Re: Question about "fd" token"
Maybe reply: ��: "Re: Re: Question about "fd" token"
Maybe reply: ��: "Re: Re: Question about "fd" token"

In all cases, we trigger and checkpoint all threads, regardless of
PHASE. They are all on the task list and will all get triggered (sent a
signal). However, we let the PHASE1 threads (if any) run before the
others are triggered.

The PHASE1 threads are normally blocked in the kernel, waiting for a
checkpoint request. As you noted, they are "triggered" first if they
exist. All they do in their signal handler code is change the state
(again, you already noted this). It is that change of state that causes
them to leave the blocked state and begin running callbacks that have
been registered with CR_THREAD_CONTEXT. After the checkpoint they resume
blocking in the kernel, ready for the next checkpoint.

If there are no PHASE1 threads, then the PHASE2 and NOPHASE threads are
signaled instead of the PHASE1. However, if there are any PHASE1 threads
in a given process, then BLCR waits until they have finished running
their callbacks and reached do_checkpoint(); this is the purpose of the
"phase_barrier". Only after that is cr_trigger_phase2() called to signal
the remaining (PHASE2 and NOPHASE) threads in the process.

Regarding the signal handler: there is one handler, cri_sig_handler(),
because signal handler registration is per-process, not per-thread.
However, that function calls others depending on the type of the thread:
PHASE1:
The (currently zero or one) thread that libcr creates to run
thread-context callbacks
Changes state to allow thread to wake and run callbacks registered as
CR_THREAD_CONTEXT
PHASE2:
Any application-created thread that has called cr_init()
Runs any callbacks registered as CR_SIGNAL_CONTEXT by the thread
NOPHASE:
This is any application thread that has NOT called cr_init() and
therefore has no thread-specific cri_thread_info structure.
Just calls do_checkpoint() without running any callbacks.

You ask in your final question how the ones not triggered as PHASE1 are
"waked up". I am not sure I understand the question, but I think you
want to know how they are made to run the BLCR code. Right? If that is
the question, then the answer is just that they are sent a signal which
is handled in the normal Linux way. These threads are made to run the
BLCR code just as any other signal handler would run. If you need to
understand Linux's signal delivery code, then I am afraid that I am not
qualified to describe that for you, but there are plenty of books and
online resources about the Linus kernel design that should help with
that. Let me know if I have missed the point of that last question.

-Paul

����� wrote:
> Hello,Professor:
>
> Thank you very much for answering my questions with great patience.But
> I have something more to ask.
>
> +"When a checkpoint is requested for a process,the BLCR kernel module
> sends each thread in that process an unblockable signal"
>
> Yes, I see BLCR do this in the function"cr_trigger_phase1()" &
> "cr_trigger_phase2()"
>
> when we execute "cr_trigger_phase1()":
>
> It's up to tasks in the target task list(proc_req->tasks).if there are
> phase1 tasks(even only one phase task) in this task list, then only
> these phase1 tasks are sent the signal "CR_SIGNUM". otherwise all the
> tasks in this list are sent the signal(becauese all of them are either
> "phase2" tasks or "no phase" tasks).
>
> when we execute "cr_trigger_phase2()":
>
> only "phase2" tasks and "no phase" tasks in the task list were sent
> the signal.
>
> I know the phase1 task is spawned when we register thread callback:
> cri_register_thread()->thread_init()->thread_main()->rc =
> cri_syscall_token(token, CR_OP_HAND_PHASE1, token);
>
> After the "cri_register_thread()" finishes , we have created a
> callback thread, this thread do the "CR_OP_HAND_PHASE1" syscall and
> register a phase1 handler, then blocks in the kernel until a
> checkpoint occurs.
>
> Here comes my first question: I guess:
>
> there are tasks in the target task list, may be phase1/phase/no phase
> tasks ,we first find phase1 tasks, if any, ok , you are not planned to
> be checkpointed, your work is to execute callback functions. so the
> handler of phase1 tasks do nothing other than execute callback
> functions. then phase2 and no phase tasks .these tasks are planned to
> be checkpointed. so invoke cr_checkpoint() or do_checkpoint() separately.
>
> Am I right? If this is right, I want to know why the target task list
> can contain the callback thread. in which scene?
>
> my second question: if I am wrong. what are the differences among the
> no-phase ,phase1, phase2 task? their corresponding signal handler deal
> with what?
> I see phase1 handler simply changes the state of thread. while phase2
> handler invoke cr_checkpoint() to execute callbacks array
> first...uh.....I am confused...
>
> my last question is: To the callback threads which are added into the
> target task list as phase1 task,I know how they are waked up after
> blocked for checkpoint request.But I don't know ones not added into
> that list , how are they waked up?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ===============================================
> ��������һ������TOM�������ɣ���������1.5G������ʲô��
> <http://bjcgi.163.net/cgi-bin/newreg.cgi?%0Arf=050602>
> ===============================================
>

-- 
Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
Future Technologies Group                 Tel: +1-510-495-2352
HPC Research Department                   Fax: +1-510-486-6900
Lawrence Berkeley National Laboratory

Next message: ��: "Re: Re: Question about "fd" token"

Previous message: ��: "Re: Re: Question about "fd" token"
In reply to: ��: "Re: Re: Question about "fd" token"
Maybe reply: ��: "Re: Re: Question about "fd" token"
Maybe reply: ��: "Re: Re: Question about "fd" token"
Maybe reply: ��: "Re: Re: Question about "fd" token"
Maybe reply: ��: "Re: Re: Question about "fd" token"
Maybe reply: ��: "Re: Re: Question about "fd" token"

Date view	Thread view	Subject view	Author view	Attachment view