And yet the mathematics in the article seems bogus to me, too. Can anyone figure out what calculation Colmez is doing?
I'm not a statistician, but if you toss a fair coin 20 times, there is about 0.1% chance of getting 17 heads, but to figure out the probability that the coin is fair given this data, it seems you need Bayes' theorem, which requires a prior probability on the coin being fair.
Confusing "the odds of this occurring with a fair coin" with "the odds that the coin is fair" is by far the most frequent statistical error I see, and by far the least often corrected. It's mildly terrifying.
Reading some other comments in this thread, I feel that I really ought to have included some more stuff here. There isn't actually such a thing as "the odds that the coin is fair." Either it's fair or it's not. What we can talk about is what probability we should ascribe to it being fair given what we know. Even a single coin flip will have a single result, uniquely determined by the way in which it is launched into the air and caught. Probabilities only exist in the presence of our ignorance of the actual facts, and some people consider probabilities to be themselves a measure of our ignorance of the world.
A normal significance test tells you P(observation|H0). [Though I'm not sure about P(observation|!H0)]. To apply Bayes' Rule you need P(H0), where P(H0)=???
You are exactly right. Saying "there is only an 8% chance of this happening with a fair coin" is something completely different from saying "there is a 92% chance the coin is biased". The author is utterly clueless when it comes to probability.
This is how frequentist statistics works. You ask the wrong question ("the chance of the data occurring given the assumption") and use clean, rigorous, impeccable math to get an answer. Bayesian statistics is (usually) the opposite - you ask the correct question ("the chance of the assumption being correct") but find that there is no way without making some big assumptions to get to the answer.
Here's some good (possibly more "fair" than I've been above) discussion if anyone wants to read/think about this more:
You know, I didn't understand that comic when it was posted (despite feeling like I have an understanding of Bayesian vs. frequentist statistics) and I still don't. So, I looked it up and apparently I'm not the only one.
It seems to me, and the commenters on stats.stackexchange [1] that this comic both misinterprets frequentist statistics and misrepresents Bayesian statistics. I realize that XKCD is a nerdy comic meant to be entertaining - I just wanted to leave this discussion here in case anyone else is confused; I think this is an important distinction and one most people interested in statistics should spend some time thinking about.
I don't know the particular DNA test used in this case, but lets assume it gave a certainty of 92% that the DNA isolated was from AK. This means that the particular sequences of DNA identified could have come from another person with a probably of 0.08 (i.e. a one-in-12.5 chance, which is not particularly low in a case like this). It does not mean that the DNA is correctly characterized with a probability of 92%.
For a repeated test to give a different probability, the identity of one or more of the sequences isolated from the sample would have to have been incorrect in one of the assays, i.e. there is a procedural error.
It is not at all like tossing coins. An analogy would be getting someone's eye color as blue the first time and brown the second.
I'm not a statistician, but if you toss a fair coin 20 times, there is about 0.1% chance of getting 17 heads, but to figure out the probability that the coin is fair given this data, it seems you need Bayes' theorem, which requires a prior probability on the coin being fair.