Honestly it is very dependent on exactly what you are doing. For general purpose...

Honestly it is very dependent on exactly what you are doing. For general purpose computing on a general purpose machine you're not going to do much better on average than an up to date openblas. In some cases, especially in the parallel case, Intel's MKL BLAS is slightly faster (but in some cases it is also slower).

There is also scikit.cuda which wraps Nvidia's cuBLAS and which can be very fast in certain cases, but isn't in any way a drop in replacement for openblas.

Then there's NumbaPro (a commercial product) from Continuum Analytics which is an LLVM backed JIT that attempts to automatically speed up your numpy coda and can automatically make your code use cuBLAS where it makes sense to do so.