标准偏差在这里和其他任何地方一样适用:它提供有关数据分散的有用信息。特别是,sd除以样本大小的平方根是一个标准误差:它估计均值的样本分布的离散度。让我们计算一下:
3.2 %/ 10000-----√= 0.032 %= 0.00032。
± 0.50 %
尽管数据不是正态分布的,但样本均值非常大,因此样本均值非常接近正态分布。 例如,这里是具有与您相同特征的样本的直方图,在右边是来自同一总体的另外一千个样本的均值的直方图。
它看起来非常接近Normal,不是吗?
100 - α %ž1 - α / 200ž1 - α / 200= 2.575899 %
(0.977−2.5758(0.032)/10000−−−−−√, 0.977+2.5758(0.032)/10000−−−−−√)=(97.62%,97.78%).
通过反转这种关系来解决样本量,可以找到足够的样本量。在这里,它告诉我们您需要一个大约
(3.2%/(0.5%/Z1−α/200))2≈272.
2729999
(97.16%,98.21%)(97.19%,98.24%)
1000036272
R
0.9770.032
set.seed(17)
#
# Study a sample of 10,000.
#
Sample <- rbeta(10^4, 20.4626, 0.4817)
hist(Sample)
hist(replicate(10^3, mean(rbeta(10^4, 20.4626, 0.4817))),xlab="%",main="1000 Sample Means")
#
# Analyze a sample designed to achieve a CI of width 1%.
#
(n.sample <- ceiling((0.032 / (0.005 / qnorm(1-0.005)))^2))
Sample <- rbeta(n.sample, 20.4626, 0.4817)
cat(round(mean(Sample), 3), round(sd(Sample), 3)) # Sample statistics
se.mean <- sd(Sample) / sqrt(length(Sample)) # Standard error of the mean
cat("CL: ", round(mean(Sample) + qnorm(0.005)*c(1,-1)*se.mean, 5)) # Normal CI
#
# Compare the bootstrapped CI of this sample.
#
Bootstrapped.means <- replicate(9999, mean(sample(Sample, length(Sample), replace=TRUE)))
hist(Bootstrapped.means)
cat("Bootstrap CL:", round(quantile(Bootstrapped.means, c(0.005, 1-0.005)), 5))