SSE, NEON and AVX are fundamental for anything reducible to matrix and vector arithmetic (e.g., signal processing and ML inference on the CPU).
> It is hard to find simple enough code to vectorize at all.
It's hard to find code simple enough for the compiler to efficiently auto-vectorize.
Anything that reduces to GEMM will parallelize in practice extremely well, and there are many excellent libraries with SIMD support (MKL, BLAS, ATLAS, Eigen, etc.). However, these libraries rely on kernels carefully written by experts and benchmarked extensively over decades. They're not the output of running naively written code through a super smart compiler.
All of this is extremely relevant to what you bought your PC or phone for. It's also not in the kernel, and therefore Linus seems to be unaware of their pervasiveness and utility.
> It is hard to find simple enough code to vectorize at all.
It's hard to find code simple enough for the compiler to efficiently auto-vectorize.
Anything that reduces to GEMM will parallelize in practice extremely well, and there are many excellent libraries with SIMD support (MKL, BLAS, ATLAS, Eigen, etc.). However, these libraries rely on kernels carefully written by experts and benchmarked extensively over decades. They're not the output of running naively written code through a super smart compiler.
All of this is extremely relevant to what you bought your PC or phone for. It's also not in the kernel, and therefore Linus seems to be unaware of their pervasiveness and utility.