Any operation in neural networks that multiplies two vectors together and then collects them with a single addition operation is a dot product.
So 3x3 convs are technically a dot product, even though they have a spatial dimension. Same for the initial part of most transformers' attention layers, MLP layers, etc....
The overlap can be a bit tricky to deal with in terms of implications, but I've found it a very helpful formulation for squeezing out some performance boosts in previous implementations of neural networks that I've worked on.
Hope this helps, feel free to let me know if you have any more questions/thoughts/etc, love! <3 :)) :D :fireworkds:
Could you clarify your second statement about Mad and 3x3s? Afraid I don't follow.