Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

> I belive all SPARC implementations have been TSO (i.e. exactly the same as x86) for a long time (and IIRC even non TSO SPARCs had mode bits to force system wide TSO).

The internet does indicate that that seems to have been the default, though it was possible to enable more relaxed models starting from v8 (PSO) and v9 (RMO).

Most of the relaxation seems to have been dropped in the meantime:

> SPARC M7 supports only TSO, with the exception that certain ASI accesses (such as block loads and stores) may operate under RMO. […] SPARC M7 ignores the value set in this field and always operates under TSO.

GP may have confused SPARC and Alpha, the latter having (in)famously relaxed approach to memory ordering (uniquely reordering even dependent loads e.g. if you load a pointer then read the value at that pointer you may get stale data).



> GP may have confused SPARC and Alpha

I seem to have confused which of the 3 memory ordering states was the default and supported on all models. I'm well aware of the Alpha.

> e.g. if you load a pointer then read the value at that pointer you may get stale data

For anyone thinking that sounds horrifying, a single core doing the writes and read won't observe anything out of the ordinary.

The designers of the Alpha thought "Okay, we can simplify and speed up the cache coherency logic if the "happens before" temporal dependency introduced by a memory fence between two writes won't make any guarantees about two reads on another core, unless there's also a "happens before" temporal dependency introduced by a memory fence between the two reads". This sounds totally reasonable, particularly at a time when lockfree data structures weren't as popular. Reads and writes done while holding a mutex work just like any other processor: the memory fence involved in acquiring a mutex guarantees that any reads made while holding the mutex will see all writes made before the previous release operation on that mutex.

The difficulty with the Alpha memory model comes mostly in writing lockfree data structures, where the safety comes from modifying some data structure while no other threads can see it, then atomically changing a pointer (usually a load-locked/store-conditional or compare-and-swap) to make that pointer available to other threads. On most architectures, you only need a memory fence right before (or a part of) the write operation that changes the pointer. On most architectures, this happens-before guarantee will be seen by all normal read operations. On Alpha, you also need a memory fence between reading that pointer and chasing it. In C, the loading the pointer and chasing it happen in a single expression *p, so you need to load the global pointer into a local variable, execute a memory fence, and then deference the pointer (or else use inline assembly, C11 atomics came out way way after the DEC Alpha AXP).

Lockfree data structures become more popular well after the Alpha was released, so it's understandable why the Alpha designers were willing to make these optimizations that don't affect code that correctly uses mutexes to protect global mutable state.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: