I really dislike the `fill_array.c ` vs `fill_array_out_of_order.c ` example. While it's showing the effects of spatial locality, it's misattributing the performance delta to the CPU L1/L2 cache.
The problem is that it's filling an array in freshly allocated memory. While you might expect `malloc(NUMBER)` to give your process a crap ton of space in your RAM, that's far from the truth. First of all, glibc will just translate that into an `mmap`, and even if it didn't, the Linux kernel still wouldn't allocate the whole buffer due to "optimistic memory allocation." Instead, you'll receive ownership of this chunk of virtual memory, but you won't actually allocate anything at first. Only when you dirty each page will it actually allocate the backing physical memory. And even then, the Kernel has various heuristics to preemptively page in memory that it thinks you're going to use soon.
I'm sure the author was aware that the example was more nuanced than just "muh CPU cache", but reducing spatial locality to just "muh CPU cache" for the article's sake does a disservice to the reader.
The doubling of J as a means of striding across the array also gives me some concern. While it is cache related, it is cache related for sneaky reasons. J is going to end up, in binary, with a ton of 0s at the end after being doubled over and over. After 6 iterations after each reset, the position within the cache line is guaranteed to be 0. Using the standard 32K of L1 D$ would be 512x64 byte lines. Assuming 8-way associative (decently common as far as I know) means these 512 lines are organized into 64 set indexes, each with 8 lines. So after the next 6 post-reset iterations you are guaranteed to only be hitting the 0th index into the cache, effectively reducing the L1 cache size to 8 lines.
(edit: Not the 0th index, but the same index as the base of the array.)
The problem is that it's filling an array in freshly allocated memory. While you might expect `malloc(NUMBER)` to give your process a crap ton of space in your RAM, that's far from the truth. First of all, glibc will just translate that into an `mmap`, and even if it didn't, the Linux kernel still wouldn't allocate the whole buffer due to "optimistic memory allocation." Instead, you'll receive ownership of this chunk of virtual memory, but you won't actually allocate anything at first. Only when you dirty each page will it actually allocate the backing physical memory. And even then, the Kernel has various heuristics to preemptively page in memory that it thinks you're going to use soon.
I'm sure the author was aware that the example was more nuanced than just "muh CPU cache", but reducing spatial locality to just "muh CPU cache" for the article's sake does a disservice to the reader.