Going by the stats on wikipedia, T4 and B300 both do about one teraflop of half-...

Going by the stats on wikipedia, T4 and B300 both do about one teraflop of half-precision math per watt? Where are the efficiency gains?

Edit: It looks like they replaced INT8 and INT4 with FP8 and FP4, with the same speedups of 2x and 4x relative to FP16. That's an improvement but not that big of an improvement.