Re: sparc implementation

Date view	Thread view	Subject view	Author view	Attachment view

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Sep 03 2008 - 13:58:40 PDT

Next message: Vincentius Robby: "Re: sparc implementation"

Previous message: Paul H. Hargrove: "Re: sparc implementation"
In reply to: Vincentius Robby: "Re: sparc implementation"
Next in thread: Vincentius Robby: "Re: sparc implementation"

I addressed the syscall question in a separate e-mail, and you indicate 
the -I issue was resolved, however I need to address your atomic ops too.

In my first e-mail reply to you on the subject a sparc port  (July 28, 
2008) I stated:
> In both the BLCR kernel and user-space code we make use of atomic 
> counters for various purposes, including a reader-writer sort of lock 
> in the user space code.  In all architectures we've supported so far 
> we've had support for atomic operations on 32-bit integers.  However, 
> on SPARC revisions prior to "SPARC V8+" (UltraSPARC) the only support 
> for atomics is via spinlocks.  The spinlock approach can work in the 
> kernel because one can block interrupts.  In user-space, however, the 
> spinlock approach can lead to deadlock in the presence of signal 
> handlers.  So, I believe that it is either impossible or impractical 
> to port BLCR to chips older than UltraSPARC. (I am willing to be 
> proven wrong, but am strongly discouraging you from trying for older 
> cpus as I think success is unlikely) 

The ldstub/stb is exactly the spinlock approach I cautioned against.  We 
use these atomics in signal handlers and if a given thread receives a 
signal between the ldstub and the stb, then the use of the atomics in 
the signal handler will deadlock.  Worse, since there is ONE lock byte 
for the entire library, use of atomics in signal handlers may deadlock 
even if accessing a different atomic counter than the code it interrupted.

Additionally, if we happen to link two versions of the library code 
(e.g. via dlopen() and LD_PRELOAD) it is possible (I think) that two 
distinct copies of the "lock" variable would exist.  In that case the 
code in the two libraries would not be atomic with respect to each other.

I don't think you should get hung up on the atomics YET.  If you can do 
your development on UltraSPARC, then I recommend you use the CAS-based 
code below.  If not, you can proceed with the ldstub-based code FOR NOW, 
and we can try to deal LATER with the (rare) deadlocks by adding to BLCR 
a kernel-level helper to perform CAS for you (in the kernel we can 
eliminate the equivalent of signals by temporarily blocking interrupts).

CR_INLINE int
cri_cmp_swap(cri_atomic_t *p, unsigned int oldval, unsigned int newval)
{
    __asm__ __volatile__ (
        /* if (*p == oldval) SWAP(*p,newval); else newval = *p; */
        "cas      [%3],%2,%0"
        : "+r"(newval), "=m"(*p)
        : "r"(oldval), "r"(p), "m"(*p) );
    return (int)(newval == oldval);
}

If you do keep your ldstub-based add-fetch, you can remove "oldval" from 
it (the declaration and the "oldval = *p"), since they don't do 
anything.  Other than that and the deadlock problem, your add-fetch 
looks correct.

Just to be 100% proper about this code I've included here:
Signed-off-by: Paul H. Hargrove <PHHargrove_at_lbl_dot_gov>

-Paul

Vincentius Robby wrote:
> Hello Paul,
>
> This is my current code for compare_and_swap:
> CR_INLINE unsigned int
> cri_cmp_swap(cri_atomic_t *p, unsigned int oldval, unsigned int newval)
> {
>   register unsigned char ret;
>   int tmp;
>   static unsigned char lock;
>   __asm__ __volatile__("1:      ldstub  [%1], %0\n\t"
>                "        cmp     %0, 0\n\t"
>                        "        bne     1b\n\t"
>                        "         nop"
>                        : "=&r" (tmp)
>                        : "r" (&lock)
>                        : "memory");
>   if (*p != oldval)
>     ret = 0;
>   else {
>     *p = newval;
>     ret = 1;
>   }
>   __asm__ __volatile__("stb     %%g0, [%0]"
>                        : /* no outputs */
>                        : "r" (&lock)
>                        : "memory");
>
>   return ret;
> }
>
> The ldstub seems to be used for memory locks. The code that includes 
> the cas instructions seems effective, although it may not be available 
> for V8.
[snip]

-- 
Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
Future Technologies Group                 
HPC Research Department                   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Next message: Vincentius Robby: "Re: sparc implementation"

Previous message: Paul H. Hargrove: "Re: sparc implementation"
In reply to: Vincentius Robby: "Re: sparc implementation"
Next in thread: Vincentius Robby: "Re: sparc implementation"

Date view	Thread view	Subject view	Author view	Attachment view