From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Sep 03 2008 - 13:58:40 PDT
I addressed the syscall question in a separate e-mail, and you indicate the -I issue was resolved, however I need to address your atomic ops too. In my first e-mail reply to you on the subject a sparc port (July 28, 2008) I stated: > In both the BLCR kernel and user-space code we make use of atomic > counters for various purposes, including a reader-writer sort of lock > in the user space code. In all architectures we've supported so far > we've had support for atomic operations on 32-bit integers. However, > on SPARC revisions prior to "SPARC V8+" (UltraSPARC) the only support > for atomics is via spinlocks. The spinlock approach can work in the > kernel because one can block interrupts. In user-space, however, the > spinlock approach can lead to deadlock in the presence of signal > handlers. So, I believe that it is either impossible or impractical > to port BLCR to chips older than UltraSPARC. (I am willing to be > proven wrong, but am strongly discouraging you from trying for older > cpus as I think success is unlikely) The ldstub/stb is exactly the spinlock approach I cautioned against. We use these atomics in signal handlers and if a given thread receives a signal between the ldstub and the stb, then the use of the atomics in the signal handler will deadlock. Worse, since there is ONE lock byte for the entire library, use of atomics in signal handlers may deadlock even if accessing a different atomic counter than the code it interrupted. Additionally, if we happen to link two versions of the library code (e.g. via dlopen() and LD_PRELOAD) it is possible (I think) that two distinct copies of the "lock" variable would exist. In that case the code in the two libraries would not be atomic with respect to each other. I don't think you should get hung up on the atomics YET. If you can do your development on UltraSPARC, then I recommend you use the CAS-based code below. If not, you can proceed with the ldstub-based code FOR NOW, and we can try to deal LATER with the (rare) deadlocks by adding to BLCR a kernel-level helper to perform CAS for you (in the kernel we can eliminate the equivalent of signals by temporarily blocking interrupts). CR_INLINE int cri_cmp_swap(cri_atomic_t *p, unsigned int oldval, unsigned int newval) { __asm__ __volatile__ ( /* if (*p == oldval) SWAP(*p,newval); else newval = *p; */ "cas [%3],%2,%0" : "+r"(newval), "=m"(*p) : "r"(oldval), "r"(p), "m"(*p) ); return (int)(newval == oldval); } If you do keep your ldstub-based add-fetch, you can remove "oldval" from it (the declaration and the "oldval = *p"), since they don't do anything. Other than that and the deadlock problem, your add-fetch looks correct. Just to be 100% proper about this code I've included here: Signed-off-by: Paul H. Hargrove <PHHargrove_at_lbl_dot_gov> -Paul Vincentius Robby wrote: > Hello Paul, > > This is my current code for compare_and_swap: > CR_INLINE unsigned int > cri_cmp_swap(cri_atomic_t *p, unsigned int oldval, unsigned int newval) > { > register unsigned char ret; > int tmp; > static unsigned char lock; > __asm__ __volatile__("1: ldstub [%1], %0\n\t" > " cmp %0, 0\n\t" > " bne 1b\n\t" > " nop" > : "=&r" (tmp) > : "r" (&lock) > : "memory"); > if (*p != oldval) > ret = 0; > else { > *p = newval; > ret = 1; > } > __asm__ __volatile__("stb %%g0, [%0]" > : /* no outputs */ > : "r" (&lock) > : "memory"); > > return ret; > } > > The ldstub seems to be used for memory locks. The code that includes > the cas instructions seems effective, although it may not be available > for V8. [snip] -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900