Average entropy is low because the events are rare. Worst-case though, you need a number of bits linear in the number of states the exponent can have, plus the number of bits of precision you need. Subnormality doesn't make that much worse because they all have the same exponent (the thing that consumes a lot of worst-case bits), and because their relative frequencies are precisely what you would obtain by filling the significand with uniformly random bits.
It's been awhile since I've thought about this problem, so I'll reserve the right to be off by a small constant additive or multiplicative factor. You need around 24+256 bits for 32-bit IEEE-754 floats (less by roughly half if you're sticking in a range like 0 to 1), and 53+2048 for 64-bit. The average number of bits required is something closer to 25 and 54, respectively.