One thing not mentioned here (nor in previous discussions of the article, it seems) is that DMA is typically not coherent with the CPU caches. This is kinda visible from the little diagram at the top, with the disk sitting on the other side of the memory, but it should be explicitly spelled out. If you're using a DMA device (memory<->device or memory<->memory copies), you might end up in a state where the DMA and the CPU see different values. This usually means data transfer to/from a Disk or GPU, though other peripherals might use it too.
Your options here are either to manually invalidate your caches and synchronize with the DMA (e.g. via interrupts), or to request from the OS that the given memory section be entirely uncached; or in some cases, you can get away with a write-through cache policy, if the DMA is only ever reading the memory.
Your options here are either to manually invalidate your caches and synchronize with the DMA (e.g. via interrupts), or to request from the OS that the given memory section be entirely uncached; or in some cases, you can get away with a write-through cache policy, if the DMA is only ever reading the memory.