Obviously I'm familiar with the definition. If you didn't get that, you should p...

Obviously I'm familiar with the definition. If you didn't get that, you should probably read my comment again. It seems like you've somehow decided that people in this thread are arguing with you, but they're not. And anyways, it's a bit silly to get mad at people because they haven't studied information theory.

Define the random variables M for message content and U for the identity of the user. The interpretation of "bits of information" that most people will have is H(M). The correct interpretation in this context is H(U). You seem to be confused about why people are talking about H(M) instead of H(U). But I think people correctly intuit that those aren't independent, so the mutual information I(U;M) = H(U) - H(U|M) is positive. And obviously if you change P(M), you will also change the amount of mutual information. That's why talking about sending fewer headers makes sense.