Re: Atomic operations (SPARC)

Date view	Thread view	Subject view	Author view	Attachment view

From: Alan Woodland (alan.woodland_at_gmail_dot_com)
Date: Tue Aug 25 2009 - 04:16:09 PDT

Next message: Paul H. Hargrove: "Re: Atomic operations (SPARC)"

Previous message: Paul H. Hargrove: "Re: Atomic operations (SPARC)"
In reply to: Paul H. Hargrove: "Re: Atomic operations (SPARC)"
Next in thread: Paul H. Hargrove: "Re: Atomic operations (SPARC)"
Reply: Paul H. Hargrove: "Re: Atomic operations (SPARC)"

2009/8/25 Paul H. Hargrove <PHHargrove_at_lbl_dot_gov>:
> Alan Woodland wrote:
>> 2009/8/24 Paul H. Hargrove <PHHargrove_at_lbl_dot_gov>:
>>
>> Which aspect of 'signalsafe' is problematic here? The async safe part?
>> (I.e. if we get interrupted by a signal halfway through an atomic op
>> we'd be holding a global lock and deadlock if we called another atomic
>> op from the signal handler?)
>>
>
> This is exactly what we need to work, because the checkpoint request arrives
> via a signal handler and the "interaction" between critical sections and the
> checkpoint requests is via a "red-black lock" implemented via signal-safe
> atomics.  Note that the issue is not a "global" lock, but that the common
> case is that the signal handler and the code it interrupts are accessing the
> same atomic variable.  That is why a "checkout" based approach using
> test-and-set or load-and-clear to lock even on the granularity of a single
> word is not acceptable.
>
>> Which parts of the library actually need to be async safe? Is it just
>> things which get called from the 'my_handler' functions in cr_cs.c and
>> cr_async.c? (Also what's the problem in blocking signals whilst inside
>> a replacement CAS function? Is it the unblockable signals?)
>>
>
> I think you've listed the right parts.  The reason for not blocking signals
> is two-fold
> 1) You can't block the checkpoint signal because BLCR will just unblocking
> from the kernel side
> 2) You wouldn't want to block it if you could, because you may be spinning
> on a change of value that will only occur in a signal handler.

I've thought about it some more and I think there might be a
workaround for modern (~ 2.6.20 IIRC) kernels and glibc(2.9?) using
signalfd(2). You can get signals delivered via a file descriptor which
sidesteps some of the problems making things async-safe I think
because the rest of your threads remain runnable.

So what this would need is two things then, firstly a thread dedicated
to handling signals via signalfd, and secondly a way of ensuring that
whilst you hold a lock inside an atomic_compare_and_swap replacement
function signals never get delivered to that thread.

Does that sound remotely sensible? It would avoid the deadlock in the
signal handler routines because progress could always be made, and it
would avoid the problem sof blocking signals because they wouldn't be
totally blocked as far as I can see? I've not seen anything that would
cause problems by running the signal handlers outside of a
'traditional' signal context, or in a dedicated thread?

The only problem I can see would be with signals directed at a
specific thread rather than the process as a whole, with the signal
thread wouldn't get to see ever. That could be worked around with a
handler that forwarded the signal to the signal handling thread I
think. (In which case blocking the signals whilst inside the CAS
replacement wouldn't be the right description anymore)

Alan

Next message: Paul H. Hargrove: "Re: Atomic operations (SPARC)"

Previous message: Paul H. Hargrove: "Re: Atomic operations (SPARC)"
In reply to: Paul H. Hargrove: "Re: Atomic operations (SPARC)"
Next in thread: Paul H. Hargrove: "Re: Atomic operations (SPARC)"
Reply: Paul H. Hargrove: "Re: Atomic operations (SPARC)"

Date view	Thread view	Subject view	Author view	Attachment view