Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

And yet the mathematics in the article seems bogus to me, too. Can anyone figure out what calculation Colmez is doing?

I'm not a statistician, but if you toss a fair coin 20 times, there is about 0.1% chance of getting 17 heads, but to figure out the probability that the coin is fair given this data, it seems you need Bayes' theorem, which requires a prior probability on the coin being fair.



Confusing "the odds of this occurring with a fair coin" with "the odds that the coin is fair" is by far the most frequent statistical error I see, and by far the least often corrected. It's mildly terrifying.


Is there a formula you can use to convert between the two?


Yes, but you need to estimate a probability distribution across various types of coin biases first.

http://en.wikipedia.org/wiki/Bayes%27_theorem


Reading some other comments in this thread, I feel that I really ought to have included some more stuff here. There isn't actually such a thing as "the odds that the coin is fair." Either it's fair or it's not. What we can talk about is what probability we should ascribe to it being fair given what we know. Even a single coin flip will have a single result, uniquely determined by the way in which it is launched into the air and caught. Probabilities only exist in the presence of our ignorance of the actual facts, and some people consider probabilities to be themselves a measure of our ignorance of the world.


Oh I get it

P(H0|observation) = P(observation|H0) * P(H0) / P(observation)

P(observation) = P(observation|H0) * P(H0) + P(observation|!H0) * P(!H0)

A normal significance test tells you P(observation|H0). [Though I'm not sure about P(observation|!H0)]. To apply Bayes' Rule you need P(H0), where P(H0)=???


Bayes theorem. The problem is that it requires a prior, which is usually unknowable.


You are exactly right. Saying "there is only an 8% chance of this happening with a fair coin" is something completely different from saying "there is a 92% chance the coin is biased". The author is utterly clueless when it comes to probability.


This is how frequentist statistics works. You ask the wrong question ("the chance of the data occurring given the assumption") and use clean, rigorous, impeccable math to get an answer. Bayesian statistics is (usually) the opposite - you ask the correct question ("the chance of the assumption being correct") but find that there is no way without making some big assumptions to get to the answer.

Here's some good (possibly more "fair" than I've been above) discussion if anyone wants to read/think about this more:

http://stats.stackexchange.com/questions/22/bayesian-and-fre...

http://www.quora.com/What-is-the-difference-between-Bayesian...


A concise explanation of the difference between Bayesian and Frequentist techniques in statistics:

http://xkcd.com/1132/


You know, I didn't understand that comic when it was posted (despite feeling like I have an understanding of Bayesian vs. frequentist statistics) and I still don't. So, I looked it up and apparently I'm not the only one.

It seems to me, and the commenters on stats.stackexchange [1] that this comic both misinterprets frequentist statistics and misrepresents Bayesian statistics. I realize that XKCD is a nerdy comic meant to be entertaining - I just wanted to leave this discussion here in case anyone else is confused; I think this is an important distinction and one most people interested in statistics should spend some time thinking about.

[1] http://stats.stackexchange.com/questions/43339/whats-wrong-w...

Edit: I can't edit my first comment now, but gweinberg's post (sibling to the grandparent of this) words the problem perfectly.


You are correct. But the reality is even worse.

I don't know the particular DNA test used in this case, but lets assume it gave a certainty of 92% that the DNA isolated was from AK. This means that the particular sequences of DNA identified could have come from another person with a probably of 0.08 (i.e. a one-in-12.5 chance, which is not particularly low in a case like this). It does not mean that the DNA is correctly characterized with a probability of 92%.

For a repeated test to give a different probability, the identity of one or more of the sequences isolated from the sample would have to have been incorrect in one of the assays, i.e. there is a procedural error.

It is not at all like tossing coins. An analogy would be getting someone's eye color as blue the first time and brown the second.


I'm not sure if Bayes theorem is relevant here, (maybe behind the scenes) but you would go probably for http://en.wikipedia.org/wiki/Statistical_hypothesis_testing


Yea, if I flipped a coin 10,000 times and 55% of them were heads then the coin is definitely biased toward heads, not a 55% chance that it's biased.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: