*Similar* architectures have been available for a plenty of time! 256 bits at on...

jiggawatts · on Nov 29, 2022

Similar is not the same.

SSE and AVX instructions are optimised primarily for 3D graphics, such as multiplying 4 floating point numbers with a 4x4 matrix. There are a handful of additional instructions optimised for doing things to pixels... and that's about it.

AVX-512 is designed to work more like what a GPU does internally, and provides a much richer set of instructions. It enables fine-grained masking and shuffles, without which many simple types of code are either impossible to compile, or much more complex... and slower. This is why auto-vectorisation with SSE an AVX are only enabled for some simple loops, and provide marginal benefits outside of those scenarios.