My intuition would be that you get more orthogonal directions to the gradient (of previous samples) if you have larger model.
My intuition would be that you get more orthogonal directions to the gradient (of previous samples) if you have larger model.