That's an interesting example, but it doesn't have anything to do with "putting the physical register file to use". Register renaming always helps as long as your algorithm is dependency-free. It's probably one of the single best general purpose optimizations available to a modern CPU design.
The classic example of where register-poor ISAs hurt isn't about "subtle differences" at all. It's the fact that there are still only 8 (for i386) named registers, and so anything that needs to deal with a working set beyond that needs to do spill/fill to memory, and you can't "rename" memory accesses (though you can sort of cheat, as with the store forward optimization -- but that doesn't work nearly so well as renaming does).
The conventional x86 way of doing it was to spill to "the stack" and I'm told that their chips actually specially optimize that nowadays. But it's all very weird, deep magic in a lot of ways.
The only stack-specific special optimization that's done is fusing the decrement/increment of esp/rsp with the store µop. And that's done mainly since push/pop are one byte opcodes, unlike general load/store.
Everything else is general memory optimizations that apply for everything like the aforementioned store forwarding. It's still expensive if the CPU can't use them (mismatched load/store size, incorrect speculation, etc.)
The classic example of where register-poor ISAs hurt isn't about "subtle differences" at all. It's the fact that there are still only 8 (for i386) named registers, and so anything that needs to deal with a working set beyond that needs to do spill/fill to memory, and you can't "rename" memory accesses (though you can sort of cheat, as with the store forward optimization -- but that doesn't work nearly so well as renaming does).