That's why I said the scaled cosine distance, the dot product between two vector...

extasia · on May 21, 2023

You're correct! It's dot product reduces to cosine sim, not the other way around.

Could you clarify your second statement about Mad and 3x3s? Afraid I don't follow.

tysam_and · on May 21, 2023

Ah, of course. Sure. I can do that.

Any operation in neural networks that multiplies two vectors together and then collects them with a single addition operation is a dot product.

So 3x3 convs are technically a dot product, even though they have a spatial dimension. Same for the initial part of most transformers' attention layers, MLP layers, etc....

The overlap can be a bit tricky to deal with in terms of implications, but I've found it a very helpful formulation for squeezing out some performance boosts in previous implementations of neural networks that I've worked on.

Hope this helps, feel free to let me know if you have any more questions/thoughts/etc, love! <3 :)) :D :fireworkds:

extasia · on May 24, 2023

Interesting!

I'm working through implementing each of these w/ only numpy, so the "it's just a dot product" should come in handy:)

Case in point I just finished implementing the skip gram algorithm... In possibly the least efficient way you could imagine xD