> The rowhammer "attack" is successful only because the hardware is just plain broken
I too am of this opinion and am surprised this view isn't widely shared. With DDR4, we should be asking for a refund and/or starting a class-action suit, yet we're putting up with software 'mitigations' instead.
This isn't like the 2008 Phenom TLB bug [1] where the CPU was locking up so AMD released a workaround that kept it from freezing at the expense of a 14% performance penalty. This is like the floating point division bug [2] where the device no longer meets basic operational and accuracy guarantees. RAM cells bleeding into each other ought to be considered a fatal flaw, not some intellectual curiosity.
I too am of this opinion and am surprised this view isn't widely shared. With DDR4, we should be asking for a refund and/or starting a class-action suit, yet we're putting up with software 'mitigations' instead.
I extensively test all the hardware I buy (CPU: LINPACK, RAM: MemTest86+) and if it fails any of those tests, it gets returned as "not fit for purpose". I've done this successfully a few times. A lot of other enthusiasists/power users do the same too, especially if they're overclocking, and searches on other forums show plenty of users testing and finding (mostly other, not rowhammer) errors in newly-bought RAM even when not overclocking. But as noted in the threads I linked to, manufacturers may be trying to cover this up and downplay its severity. Even in the original paper on rowhammer, the authors didn't disclose which manufacturers and which modules were affected, although I think this should really be treated like the FDIV bug: name and shame. I blame political correctness...
The Intel LINPACK distribution contains, besides the library, a sample benchmarking application using it, and that happens to be a very intense and "real" workload (solving systems of equations, i.e. scientific computation.) There's plenty of posts on various PC enthusiasists forums about how to run it correctly. (And plenty arguing that it's irrelevant, mostly because their insane overclock seems fine but instantly fails this test. There's a good reason most doing "real" scientific computing don't overclock; a lot of CPUs just barely pass this absolutely realistic test with stock speeds and voltages.)
FDIV was really not technically a serious errata in the grand scheme of erratas. The Phenom TLB bug was worse. Intel basically denied/sat on the issue for half a year, stopped just short of slandering Dr. Nicely, etc, they made it into a complete PR disaster. If they came out the week after it was reported and just said, here's a workaround, here's an opt-in replacement program (which they finally did, but then it was too late), you would probably never have heard about the FDIV bug -- like the countless other errata we have software workarounds for.
In retrospect I regret bringing up the Phenom because my argument could've stood without it, and I could realistically argue either way.
But my original intention was pointing out that the failure mode of the Phenom was such that it wasn't exploitable for anything other than potentially denial-of-service; it was just inconvenient, and only affected a subsystem of the CPU which worked fine without it using a firmware workaround.
Though you don't expect your CPU to halt and lock up, I believe it's far more insidious when you feed a device inputs and get the wrong output without any obvious indication that something went wrong, like in the case of rowhammer-vulnerable memory and FDIV.
I think that is the reason for the misunderstanding.. FDIV was not really insidious in the way you describe. It was 100% predictable, certain bit patterns always gave the wrong answer in the quotient on the affected hardware and it had a very straightforward software fix (with a performance effect sure). You could demonstrate it immediately, but it really wasn't severe.
(Q9 and Q10 http://www.trnicely.net/pentbug/pentbug.html)
Rowhammer is a much more complex errata and I don't feel qualified to comment on, especially the safety of the published mitigations, but it is in a class of bugs where the outcomes are not generally predictable due to more variables involved.
My reason for replying initially though, is that I don't think that the line for what types of hardware defects are open to software workarounds is so cut and dry, and I don't think many people outside of kernel/OS dev realize how many errata are on the chips they use everyday with workarounds they don't notice.
I too am of this opinion and am surprised this view isn't widely shared. With DDR4, we should be asking for a refund and/or starting a class-action suit, yet we're putting up with software 'mitigations' instead.
This isn't like the 2008 Phenom TLB bug [1] where the CPU was locking up so AMD released a workaround that kept it from freezing at the expense of a 14% performance penalty. This is like the floating point division bug [2] where the device no longer meets basic operational and accuracy guarantees. RAM cells bleeding into each other ought to be considered a fatal flaw, not some intellectual curiosity.
[1] http://techreport.com/review/13741/phenom-tlb-patch-benchmar...
[2] https://en.wikipedia.org/wiki/Pentium_FDIV_bug