IIRC, there was a big stink because ARM and POWER historically implemented consume/release semantics, which is very slightly more relaxed than the now de-facto standard acquire/release semantics.
ARM and POWERPC CPU devs worked very hard to get consume/release into C++11, but no compiler writer actually implemented that part of the standard. As such, consume/release can be safely forgotten into the annals of computer history (much like DEC Alpha's fully relaxed semantics)
Then in ARM8, ARM simply added LDAR (Load-acquire) and STLR (Store-release) instructions. https://developer.arm.com/docs/100941/0100/barriers . So the ARM CPU how fully supports the acquire/release model. Apparently IBM's POWER instruction set was similarly strengthened to acquire/release (either POWER8 or POWER9).
ARM / POWER "normal" loads and stores are still consume/release semantics. But compilers can simply emit LDAR (load-acquire) for the stronger guarantee.
----------
I remember at least one talk that showed that consume/release is ideal for things like Linux's RCU or something like that (that acquire/release is actually "too strong" for RCU, and therefore suboptimal). But because compiler-writers found consume/release too hard to reason about in practice, we're stuck with acquire/release.
> ARM and POWERPC CPU devs worked very hard to get consume/release into C++11
AFAIR Paul McKenney was the primus motor, and the motivation was largely RCU. Then again, McKenney also worked for IBM at the time and certainly had an interest in pushing a model that mapped well to POWER.
But it turned out to be both somewhat mis-specified and hard to implement cleanly, so most compilers just implemented it as an acquire.
ARM and POWERPC CPU devs worked very hard to get consume/release into C++11, but no compiler writer actually implemented that part of the standard. As such, consume/release can be safely forgotten into the annals of computer history (much like DEC Alpha's fully relaxed semantics)
Then in ARM8, ARM simply added LDAR (Load-acquire) and STLR (Store-release) instructions. https://developer.arm.com/docs/100941/0100/barriers . So the ARM CPU how fully supports the acquire/release model. Apparently IBM's POWER instruction set was similarly strengthened to acquire/release (either POWER8 or POWER9).
ARM / POWER "normal" loads and stores are still consume/release semantics. But compilers can simply emit LDAR (load-acquire) for the stronger guarantee.
----------
I remember at least one talk that showed that consume/release is ideal for things like Linux's RCU or something like that (that acquire/release is actually "too strong" for RCU, and therefore suboptimal). But because compiler-writers found consume/release too hard to reason about in practice, we're stuck with acquire/release.
It seems like the C++ standard continues to evolve to push for memory_order_consume (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p075...), but all the details are still up for discussion.