OK, so maybe we're agreed: you can bet on his abilities in one election. Let's s...

astrange · on May 3, 2023

It's a bit more flexible than that, though - rather than ignore 80% vs 70% and make it all up or down, you can let him predict a few different events in a row, add up the errors, and see how off they are.

Or if you do want to review the past, you can look at the error for a category of elections or that entire year rather than his whole prediction career.

AlbertCory · on May 3, 2023

> predict a few different events in a row, add up the errors, and see how off they are.

I'm not seeing a formula there.

dllthomas · on May 3, 2023

How about here? https://en.wikipedia.org/wiki/Scoring_rule

AlbertCory · on May 3, 2023

that's the definition of a rule.

> One could note the number of times that a 25% probability was quoted, over a long period, and compare this with the actual proportion of times that rain fell.

it still depends on many samples, or "over a long period" in your doc.

You can't escape the fact that there are only one or two samples, no matter how much math you throw around.

dllthomas · on May 3, 2023

> that's the definition of a rule.

And there are several example rules on the page.

> You can't escape the fact that there are only one or two samples, no matter how much math you throw around.

That depends on what question you're asking. "How well calibrated are the electoral predictions that FiveThirtyEight makes?" is a sensible question with a lot of data points, seems to speak directly to the crowing about the one call being bad, and seems well suited to the application of a scoring rule for comparison between people making predictions about the same things.