From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Mon Aug 24 2009 - 17:05:38 PDT
See replies/answers below Alan Woodland wrote: > 2009/8/24 Paul H. Hargrove <PHHargrove_at_lbl_dot_gov>: > [snip] > >> There are at least 4 parts to the answer. >> >> 0) I'd be surprised if the SPARC architecture it usable, even if it >> compiles. It is documented to be incomplete and to support (IIRC) only >> UltraSPARC and higher (see #2, below). >> >> 1) The gcc atomic builtins were not (widely?) available when we started this >> project, and until recently we supported kernels as old as 2.4.0, which >> required an ancient gcc-2.96. If you were to implement an arch/generic in >> the same spirit as the kernel's asm-generic I think we could use it when >> support exists in the compiler, and as a porting aid, but we'd probably not >> throw away the existing implementations. >> > It looks like my initial reading of the atomic builtins was incorrect, > and gcc generates the call to the external function if required, but > not the function itself. This would still make porting to > architectures with CAS a lot easier though. > > Testing at configure time the availability of atomic builtins is > trivial and could then direct to either those or the existing > implementations appropriately. > Right. GCC generates the atomic code IF it is among those implemented by its developers; the extern function call is generated if GCC doesn't (yet) know how to implement the given atomic on the given architecture. This is why I'd not immediately discard the existing implementations (they can be renamed to satisfy GCC's external function calls when needed). As you observe, this option might avoid the need to write any new implementations. > For the replacement functions when it's not available some kind of > 'cancelable' futex-based 'lock' shouldn't be impossible to construct, > and then the atomic_cas replacement could handle cancellation sanely? > > >> 2) Sparclite does not have the required atomic instructions. With only a >> test-and-set or load-and-clear type of atomic support one cannot implement a >> signal-safe compare-and-swap in user space (kernel does it by blocking >> interrupts momentarily). This is true of "older" ARM as well, but on that >> platform there is a neat kernel trick implemented by/for the NPTL port using >> the equivalent of a VDSO to use the atomic instructions on "newer" ARM or >> use the kernel for help otherwise. I don't know of such a trick on >> sparclite, but NPTL must work somehow. >> > Which aspect of 'signalsafe' is problematic here? The async safe part? > (I.e. if we get interrupted by a signal halfway through an atomic op > we'd be holding a global lock and deadlock if we called another atomic > op from the signal handler?) > This is exactly what we need to work, because the checkpoint request arrives via a signal handler and the "interaction" between critical sections and the checkpoint requests is via a "red-black lock" implemented via signal-safe atomics. Note that the issue is not a "global" lock, but that the common case is that the signal handler and the code it interrupts are accessing the same atomic variable. That is why a "checkout" based approach using test-and-set or load-and-clear to lock even on the granularity of a single word is not acceptable. > Which parts of the library actually need to be async safe? Is it just > things which get called from the 'my_handler' functions in cr_cs.c and > cr_async.c? (Also what's the problem in blocking signals whilst inside > a replacement CAS function? Is it the unblockable signals?) > I think you've listed the right parts. The reason for not blocking signals is two-fold 1) You can't block the checkpoint signal because BLCR will just unblocking from the kernel side 2) You wouldn't want to block it if you could, because you may be spinning on a change of value that will only occur in a signal handler. > >> 3) Ever since we dropped 2.4.x kernel support, I have been wondering if we >> could drop our user-space atomics entirely in favor of using futexes. >> However, I've not had the opportunity to examine that possibility. >> > It sounds quite feasible. I'm not familiar enough yet with the > internals to do it though. > I am not familiar with the futex API and have forgotten too much of the BLCR internal use of atomics to coach anybody in reimplementing in terms of futexes. > Q: Is there read only access to CVS available anywhere? The only > mention of it other than reports on commits in bugzilla is in this > email: > http://www.nersc.gov/hypermail/checkpoint/0351.html > We don't have a public CVS repository due to both concerns about security (we work at a DOE-sponsored National Lab) and management concerns (probably no longer relevant) over what constitutes a "release" of software. > Or are patches against 0.8.2 acceptable? > I can accept patches relative to 0.8.X. For the atomics-related code, both the implementations and the "client" code have almost certainly not changed in the CVS HEAD. If can always roll a snapshot tarball if a developer has a need for one. > Alan > > P.S. Is this list the right place for this discussion? > > I encourage discussions like this on the checkpoint_at_lbl_dot_gov list (which is why I manually add the Reply-To: header when I respond) in part because it gets these things archived, and in part because occasionally there is some subscriber listening who jumps in with a good idea or an offer of assistance. -Paul -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group Tel: +1-510-495-2352 HPC Research Department Fax: +1-510-486-6900 Lawrence Berkeley National Laboratory