HN2new | past | comments | ask | show | jobs | submitlogin

The number of samples you want to take depends on the variance of the population and the precision with with you want to estimate the population mean (if both of these are greater you want to take a larger sample). 200 is normally sufficient, but its possible to calculate confidence intervals as a robustness check.

The Central Limit Theorem states that even if your underlying distribution is not normally distributed, square_root(n)* (sample_mean - population_mean) will converge to a normal distribution with mean 0 and variance = population variance. Sample variance also gives you an unbiased estimate of population variance.

This means that if n is large, you can compute confidence intervals from n and observation = [x1, x2 .. xi .. xn]

1) computing the sample mean and sample variance.

sample_mean = x_bar = 1/n sum(xi) and

sample_variance = 1/n-1 sum((xi- x_bar)^2)

2) computing x_bar +- square_root(sample_variance/n)

Eg if sample mean is 102, sample variance is 50 and n = 200, your 95% confidence interval would be:

102 +- 1.96 *square_root(50/200) =approx [101, 103]

http://mathworld.wolfram.com/SampleVariance.html

https://en.wikipedia.org/wiki/Central_limit_theorem#Classica...

https://en.wikipedia.org/wiki/Confidence_interval#Definition



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: