Re: Atomic operations (SPARC)

Date view	Thread view	Subject view	Author view	Attachment view

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Mon Aug 24 2009 - 17:05:38 PDT

Next message: Alan Woodland: "Re: Atomic operations (SPARC)"

Previous message: Alan Woodland: "Re: Atomic operations (SPARC)"
In reply to: Alan Woodland: "Re: Atomic operations (SPARC)"
Next in thread: Alan Woodland: "Re: Atomic operations (SPARC)"
Reply: Alan Woodland: "Re: Atomic operations (SPARC)"

See replies/answers below

Alan Woodland wrote:
> 2009/8/24 Paul H. Hargrove <PHHargrove_at_lbl_dot_gov>:
> [snip]
>   
>> There are at least 4 parts to the answer.
>>
>> 0) I'd be surprised if the SPARC architecture it usable, even if it
>> compiles.  It is documented to be incomplete and to support (IIRC) only
>> UltraSPARC and higher (see #2, below).
>>
>> 1) The gcc atomic builtins were not (widely?) available when we started this
>> project, and until recently we supported kernels as old as 2.4.0, which
>> required an ancient gcc-2.96.  If you were to implement an arch/generic in
>> the same spirit as the kernel's asm-generic I think we could use it when
>> support exists in the compiler, and as a porting aid, but we'd probably not
>> throw away the existing implementations.
>>     
> It looks like my initial reading of the atomic builtins was incorrect,
> and gcc generates the call to the external function if required, but
> not the function itself. This would still make porting to
> architectures with CAS a lot easier though.
>
> Testing at configure time the availability of atomic builtins is
> trivial and could then direct to either those or the existing
> implementations appropriately.
>   

Right.  GCC generates the atomic code IF it is among those implemented 
by its developers; the extern function call is generated if GCC doesn't 
(yet) know how to implement the given atomic on the given architecture.  
This is why I'd not immediately discard the existing implementations 
(they can be renamed to satisfy GCC's external function calls when 
needed).  As you observe, this option might avoid the need to write any 
new implementations.

> For the replacement functions when it's not available some kind of
> 'cancelable' futex-based 'lock' shouldn't be impossible to construct,
> and then the atomic_cas replacement could handle cancellation sanely?
>
>   
>> 2) Sparclite does not have the required atomic instructions.  With only a
>> test-and-set or load-and-clear type of atomic support one cannot implement a
>> signal-safe compare-and-swap in user space (kernel does it by blocking
>> interrupts momentarily).  This is true of "older" ARM as well, but on that
>> platform there is a neat kernel trick implemented by/for the NPTL port using
>> the equivalent of a VDSO to use the atomic instructions on "newer" ARM or
>> use the kernel for help otherwise.  I don't know of such a trick on
>> sparclite, but NPTL must work somehow.
>>     
> Which aspect of 'signalsafe' is problematic here? The async safe part?
> (I.e. if we get interrupted by a signal halfway through an atomic op
> we'd be holding a global lock and deadlock if we called another atomic
> op from the signal handler?)
>   

This is exactly what we need to work, because the checkpoint request 
arrives via a signal handler and the "interaction" between critical 
sections and the checkpoint requests is via a "red-black lock" 
implemented via signal-safe atomics.  Note that the issue is not a 
"global" lock, but that the common case is that the signal handler and 
the code it interrupts are accessing the same atomic variable.  That is 
why a "checkout" based approach using test-and-set or load-and-clear to 
lock even on the granularity of a single word is not acceptable.

> Which parts of the library actually need to be async safe? Is it just
> things which get called from the 'my_handler' functions in cr_cs.c and
> cr_async.c? (Also what's the problem in blocking signals whilst inside
> a replacement CAS function? Is it the unblockable signals?)
>   

I think you've listed the right parts.  The reason for not blocking 
signals is two-fold
1) You can't block the checkpoint signal because BLCR will just 
unblocking from the kernel side
2) You wouldn't want to block it if you could, because you may be 
spinning on a change of value that will only occur in a signal handler.

>   
>> 3) Ever since we dropped 2.4.x kernel support, I have been wondering if we
>> could drop our user-space atomics entirely in favor of using futexes.
>>  However, I've not had the opportunity to examine that possibility.
>>     
> It sounds quite feasible. I'm not familiar enough yet with the
> internals to do it though.
>   

I am not familiar with the futex API and have forgotten too much of the 
BLCR internal use of atomics to coach anybody in reimplementing in terms 
of futexes.

> Q: Is there read only access to CVS available anywhere? The only
> mention of it other than reports on commits in bugzilla is in this
> email:
> http://www.nersc.gov/hypermail/checkpoint/0351.html
>   

We don't have a public CVS repository due to both concerns about 
security (we work at a DOE-sponsored National Lab) and management 
concerns (probably no longer relevant) over what constitutes a "release" 
of software.

> Or are patches against 0.8.2 acceptable?
>   

I can accept patches relative to 0.8.X.  For the atomics-related code, 
both the implementations and the "client" code have almost certainly not 
changed in the CVS HEAD.  If can always roll a snapshot tarball if a 
developer has a need for one.

> Alan
>
> P.S. Is this list the right place for this discussion?
>
>   

I encourage discussions like this on the checkpoint_at_lbl_dot_gov list (which 
is why I manually add the Reply-To: header when I respond) in part 
because it gets these things archived, and in part because occasionally 
there is some subscriber listening who jumps in with a good idea or an 
offer of assistance.

-Paul

-- 
Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
Future Technologies Group                 Tel: +1-510-495-2352
HPC Research Department                   Fax: +1-510-486-6900
Lawrence Berkeley National Laboratory

Next message: Alan Woodland: "Re: Atomic operations (SPARC)"

Previous message: Alan Woodland: "Re: Atomic operations (SPARC)"
In reply to: Alan Woodland: "Re: Atomic operations (SPARC)"
Next in thread: Alan Woodland: "Re: Atomic operations (SPARC)"
Reply: Alan Woodland: "Re: Atomic operations (SPARC)"

Date view	Thread view	Subject view	Author view	Attachment view