I don't understand. At ANALYZE time isn't, all other parameters being equal and adequate (costs, GEQO at max, effective_cache_size ...), the probability of obtaining a representative set of columns better with a larger amount of randomly-selected values? Then at planning time isn't the the devised plan of better quality?
Adding sampled values may be bad performance-wise, for example if the planner cannot take everything into account due to some margin/interval effect, and therefore produces the same plan using a bigger set of values. The random selection process may also, sometimes, select less-representative data in the biggest analyzed set. But how may it never lead to the best plans (which may be produced using a smaller analyzed set) or lead to (on average) worse plans?
First, your point about planning time is important, thanks for adding that.
Regarding my point, it's possible that the planner may provide a better (on average, faster executed) plan for a given key, if that key is not found in stats, and that keys for which this is true may fit a pattern within the middle of the stats distribution. It all depends on the database schema and stats distributions.
Adding sampled values may be bad performance-wise, for example if the planner cannot take everything into account due to some margin/interval effect, and therefore produces the same plan using a bigger set of values. The random selection process may also, sometimes, select less-representative data in the biggest analyzed set. But how may it never lead to the best plans (which may be produced using a smaller analyzed set) or lead to (on average) worse plans?