首先,我不是在问这个:
为什么零相关性并不意味着独立?
这在这里得到解决(相当好):https : //math.stackexchange.com/questions/444408/why-does-zero-correlation-not-imply-independence
我要问的是相反的意思...说两个变量完全相互独立。
难道他们偶然之间没有一点联系吗?
不应该...独立意味着非常少的相关性吗?
首先,我不是在问这个:
为什么零相关性并不意味着独立?
这在这里得到解决(相当好):https : //math.stackexchange.com/questions/444408/why-does-zero-correlation-not-imply-independence
我要问的是相反的意思...说两个变量完全相互独立。
难道他们偶然之间没有一点联系吗?
不应该...独立意味着非常少的相关性吗?
Answers:
根据相关系数的定义,如果两个变量是独立的,则它们的相关为零。因此,它不可能偶然发生任何关联!
如果和独立,则意味着。因此,的分子是在此情况下为零。
因此,如此处所述,如果不更改相关的含义,则不可能。除非明确相关的定义。
Comment on sample correlation. In comparing two small independent samples of the same size, the sample correlation is often noticeably different from [这里没有任何问题与@OmG的人口相关性答案(+1)相矛盾
考虑一百万对独立样本的大小之间的相关性 从指数分布与比率
set.seed(616)
r = replicate( 10^6, cor(rexp(5), rexp(5)) )
mean(abs(r) > .5)
[1] 0.386212
mean(r)
[1] -0.0005904455
hist(r, prob=T, br=40, col="skyblue2")
abline(v=c(-.5,.5), col="red", lwd=2)
例如,这是百万个大小对样本中的第一对的散点图 为此
在这方面,指数分布没有什么特别的。将父级分布更改为标准正态可得到以下结果。
set.seed(2019)
...
mean(abs(r) > .5)
[1] 0.391061
mean(r)
[1] 1.43269e-05
相比之下,这是成对的正态样本对的相关性直方图
注意:本网站的其他页面讨论了更详细地 其中一个就是这个问答环节。
简单的答案:如果2个变量是独立的,则总体相关性为零,而样本相关性通常较小,但不为零。
That is because the sample is not a perfect representation of the population.
The larger the sample, the better it represents the population, so the smaller the correlation you'll have. For an infinite sample, the correlation would be zero.
Maybe this is helpful for some people sharing the same intuitive understanding. We've all seen something like this:
These data are presumably independent but clearly exhibit correlation (). "I thought independence implies zero correlation!" the student says.
As others have already pointed out, the sample values are correlated, but that does not mean the population has nonzero correlation.
Of course, these two should be independent—given Nicolas Cage appeared in a record-setting 10 films this year, we shouldn't be closing the local pool for the summer for safety purposes.
But when we check how many people drown this year, there is a small chance that a record-setting 1000 people drown this year.
Getting such correlation is unlikely. Maybe one in a thousand. But it's possible, even though the two are independent. But this is just one case. Consider that there the millions of possible events to measure out there, and you can see the chance that the odds of some two happening to give a high correlation is quite high (hence the existence of graphs such as that above).
Another way to look at it is that guaranteeing that two independent events will always give uncorrelated values is itself restrictive. Given two independent dice, and the results of the first, there are a certain (sizable) set of results for the second dice which will give some nonzero correlation. To restrict the second dice's results to give zero correlation with the first is a clear violation of independence, as the first dice's rolls are now affecting the distribution of the results.