Re: Atomic operations (SPARC)

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Mon Aug 24 2009 - 17:05:38 PDT

  • Next message: Alan Woodland: "Re: Atomic operations (SPARC)"
    See replies/answers below
    
    Alan Woodland wrote:
    > 2009/8/24 Paul H. Hargrove <PHHargrove_at_lbl_dot_gov>:
    > [snip]
    >   
    >> There are at least 4 parts to the answer.
    >>
    >> 0) I'd be surprised if the SPARC architecture it usable, even if it
    >> compiles.  It is documented to be incomplete and to support (IIRC) only
    >> UltraSPARC and higher (see #2, below).
    >>
    >> 1) The gcc atomic builtins were not (widely?) available when we started this
    >> project, and until recently we supported kernels as old as 2.4.0, which
    >> required an ancient gcc-2.96.  If you were to implement an arch/generic in
    >> the same spirit as the kernel's asm-generic I think we could use it when
    >> support exists in the compiler, and as a porting aid, but we'd probably not
    >> throw away the existing implementations.
    >>     
    > It looks like my initial reading of the atomic builtins was incorrect,
    > and gcc generates the call to the external function if required, but
    > not the function itself. This would still make porting to
    > architectures with CAS a lot easier though.
    >
    > Testing at configure time the availability of atomic builtins is
    > trivial and could then direct to either those or the existing
    > implementations appropriately.
    >   
    
    Right.  GCC generates the atomic code IF it is among those implemented 
    by its developers; the extern function call is generated if GCC doesn't 
    (yet) know how to implement the given atomic on the given architecture.  
    This is why I'd not immediately discard the existing implementations 
    (they can be renamed to satisfy GCC's external function calls when 
    needed).  As you observe, this option might avoid the need to write any 
    new implementations.
    
    
    > For the replacement functions when it's not available some kind of
    > 'cancelable' futex-based 'lock' shouldn't be impossible to construct,
    > and then the atomic_cas replacement could handle cancellation sanely?
    >
    >   
    >> 2) Sparclite does not have the required atomic instructions.  With only a
    >> test-and-set or load-and-clear type of atomic support one cannot implement a
    >> signal-safe compare-and-swap in user space (kernel does it by blocking
    >> interrupts momentarily).  This is true of "older" ARM as well, but on that
    >> platform there is a neat kernel trick implemented by/for the NPTL port using
    >> the equivalent of a VDSO to use the atomic instructions on "newer" ARM or
    >> use the kernel for help otherwise.  I don't know of such a trick on
    >> sparclite, but NPTL must work somehow.
    >>     
    > Which aspect of 'signalsafe' is problematic here? The async safe part?
    > (I.e. if we get interrupted by a signal halfway through an atomic op
    > we'd be holding a global lock and deadlock if we called another atomic
    > op from the signal handler?)
    >   
    
    This is exactly what we need to work, because the checkpoint request 
    arrives via a signal handler and the "interaction" between critical 
    sections and the checkpoint requests is via a "red-black lock" 
    implemented via signal-safe atomics.  Note that the issue is not a 
    "global" lock, but that the common case is that the signal handler and 
    the code it interrupts are accessing the same atomic variable.  That is 
    why a "checkout" based approach using test-and-set or load-and-clear to 
    lock even on the granularity of a single word is not acceptable.
    
    > Which parts of the library actually need to be async safe? Is it just
    > things which get called from the 'my_handler' functions in cr_cs.c and
    > cr_async.c? (Also what's the problem in blocking signals whilst inside
    > a replacement CAS function? Is it the unblockable signals?)
    >   
    
    I think you've listed the right parts.  The reason for not blocking 
    signals is two-fold
    1) You can't block the checkpoint signal because BLCR will just 
    unblocking from the kernel side
    2) You wouldn't want to block it if you could, because you may be 
    spinning on a change of value that will only occur in a signal handler.
    
    >   
    >> 3) Ever since we dropped 2.4.x kernel support, I have been wondering if we
    >> could drop our user-space atomics entirely in favor of using futexes.
    >>  However, I've not had the opportunity to examine that possibility.
    >>     
    > It sounds quite feasible. I'm not familiar enough yet with the
    > internals to do it though.
    >   
    
    I am not familiar with the futex API and have forgotten too much of the 
    BLCR internal use of atomics to coach anybody in reimplementing in terms 
    of futexes.
    
    
    > Q: Is there read only access to CVS available anywhere? The only
    > mention of it other than reports on commits in bugzilla is in this
    > email:
    > http://www.nersc.gov/hypermail/checkpoint/0351.html
    >   
    
    We don't have a public CVS repository due to both concerns about 
    security (we work at a DOE-sponsored National Lab) and management 
    concerns (probably no longer relevant) over what constitutes a "release" 
    of software.
    
    > Or are patches against 0.8.2 acceptable?
    >   
    
    I can accept patches relative to 0.8.X.  For the atomics-related code, 
    both the implementations and the "client" code have almost certainly not 
    changed in the CVS HEAD.  If can always roll a snapshot tarball if a 
    developer has a need for one.
    
    > Alan
    >
    > P.S. Is this list the right place for this discussion?
    >
    >   
    
    I encourage discussions like this on the checkpoint_at_lbl_dot_gov list (which 
    is why I manually add the Reply-To: header when I respond) in part 
    because it gets these things archived, and in part because occasionally 
    there is some subscriber listening who jumps in with a good idea or an 
    offer of assistance.
    
    -Paul
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     
    

  • Next message: Alan Woodland: "Re: Atomic operations (SPARC)"