对于许多人来说,这可能是一个简单的问题,但这是:
为什么不将方差定义为彼此跟随的每个值之间的差异,而不是平均值的差异?
对我来说,这将是更合乎逻辑的选择,我想我显然已经忽略了一些缺点。谢谢
编辑:
让我尽可能清楚地改写一下。这就是我的意思:
- 假设您有一系列数字,顺序为:1、2、3、4、5
 - 计算并总结(绝对,连续)每个值之间的差异(连续,在每个后续值之间,而不是成对)(不使用平均值)。
 - 除以差异数量
 - (后续:如果数字是无序的,答案会有所不同)
 
->与方差的标准公式相比,此方法有哪些缺点?
对于许多人来说,这可能是一个简单的问题,但这是:
为什么不将方差定义为彼此跟随的每个值之间的差异,而不是平均值的差异?
对我来说,这将是更合乎逻辑的选择,我想我显然已经忽略了一些缺点。谢谢
让我尽可能清楚地改写一下。这就是我的意思:
->与方差的标准公式相比,此方法有哪些缺点?
Answers:
最明显的原因是这些值通常没有时间顺序。因此,如果您弄乱了数据,则它对数据传达的信息没有影响。如果我们遵循您的方法,那么每次您弄乱数据时,都会得到不同的样本方差。
从理论上讲,样本方差估计随机变量的真实方差。随机变量的真实方差为 E [(X - E X )2 ]。
在此,表示期望值或“平均值”。因此,方差的定义是变量与其平均值之间的平均平方距离。当您查看此定义时,这里没有“时间顺序”,因为没有数据。它只是随机变量的一个属性。
当您从此分布收集iid数据时,您就有。估计期望值的最佳方法是取样本平均值。这里的关键是我们获得了iid数据,因此没有对数据的排序。样本x 1,x 2,… ,x n与样本x 2,x 5,x 1,x n相同。。
编辑
样本方差衡量样本的一种特定类型的分散,即一种衡量与均值的平均距离的方法。还有其他类型的分散,例如数据范围和分位数范围。
即使您按升序对值进行排序,也不会改变样本的特征。您获得的样本(数据)是变量的实现。计算样本方差类似于了解变量中的离散程度。因此,例如,如果您对20个人进行抽样并计算其身高,则从随机变量身高中获得20个“实现” 。现在,样本方差应该用来衡量个体身高的总体变化。如果您对数据进行排序 100 ,110 ,123 ,124 ,... ,
不会更改样本中的信息。
让我们再看一个例子。可以说你从以这种方式排列的随机变量具有100个观测那么平均后续距离为1个单位,因此根据您的方法,方差为1。
解释“方差”或“分散”的方法是了解数据可能在哪个值范围内。在这种情况下,您将获得.99单位的范围,这当然不能很好地代表变化。
如果您不求平均值,而只是求和随后的差值,那么您的方差将是99。当然,这并不代表样本中的变异性,因为99代表了数据的范围,而不是变异性。
它被定义的方式!
这是代数。令值是。用F表示这些值的经验分布函数(这意味着每个x i在值x i处贡献的概率为1 / n),并且X和Y是具有分布F的独立随机变量。借助方差的基本属性(即,它是二次形式)以及F的定义和事实和 Y的均值相同
该公式不依赖于的排序方式:它使用所有可能的分量对,并使用平方差的一半进行比较。它可以,但是,涉及到的平均在所有可能的顺序(组小号(ñ )所有的ñ !索引的排列1 ,2 ,... ,ñ)。即
该内总和需要重新排序的值和和的(半)的平方所有之间的差异ñ - 1连续对。除以n本质上是平均这些连续平方差。它计算所谓的lag-1半方差。外部求和对所有可能的排序执行此操作。
标准方差公式的这两个等效代数视图为方差的含义提供了新的见解。半方差是序列串行协方差的逆度量:当半方差低时,协方差高(且数字呈正相关),反之亦然。因此,无序数据集的方差是在任意重新排序下可获得的所有可能半方差的平均值。
Just a complement to the other answers, variance can be computed as the squared difference between terms:
I think this is the closest to the OP proposition. Remember the variance is a measure of dispersion of every observation at once, not only between "neighboring" numbers in the set.
Using your example: . We know the variance is .
With your proposed method , so we know beforehand taking the differences between neighbors as variance doesn't add up. What I meant was taking every possible difference squared then summed:
Others have answered about the usefulness of variance defined as usual. Anyway, we just have two legitimate definitions of different things: the usual definition of variance, and your definition.
Then, the main question is why the first one is called variance and not yours. That is just a matter of convention. Until 1918 you could have invented anything you want and called it "variance", but in 1918 Fisher used that name to what is still called variance, and if you want to define anything else you will need to find another name to name it.
The other question is if the thing you defined might be useful for anything. Others have pointed its problems to be used as a measure of dispersion, but it's up to you to find applications for it. Maybe you find so useful applications that in a century your thing is more famous than variance.
@GreenParker answer is more complete, but an intuitive example might be useful to illustrate the drawback to your approach.
In your question, you seem to assume that the order in which realisations of a random variable appear matters. However, it is easy to think of examples in which it doesn't.
Consider the example of the height of individuals in a population. The order in which individuals are measured is irrelevant to both the mean height in the population and the variance (how spread out those values are around the mean).
Your method would seem odd applied to such a case.
Although there are many good answers to this question I believe some important points where left behind and since this question came up with a really interesting point I would like to provide yet another point of view.
Why isn't variance defined as the difference between every value following    
each other instead of the difference to the average of the values?
The first thing to have in mind is that the variance is a particular kind of parameter, and not a certain type of calculation. There is a rigorous mathematical definition of what a parameter is but for the time been we can think of then as mathematical operations on the distribution of a random variable. For example if is a random variable with distribution function then its mean , which is also a parameter, is:
and the variance of , , is:
The role of estimation in statistics is to provide, from a set of realizations of a r.v., a good approximation for the parameters of interest.
What I wanted to show is that there is a big difference in the concepts of a parameters (the variance for this particular question) and the statistic we use to estimate it.
Why isn't the variance calculated this way?
So we want to estimate the variance of a random variable from a set of independent realizations of it, lets say . The way you propose doing it is by computing the absolute value of successive differences, summing and taking the mean:
and the usual statistic is:
where is the sample mean.
When comparing two estimator of a parameter the usual criterion for the best one is that which has minimal mean square error (MSE), and a important property of MSE is that it can be decomposed in two components:
MSE = estimator bias + estimator variance.
Using this criterion the usual statistic, , has some advantages over the one you suggests.
First it is a unbiased estimator of the variance but your statistic is not unbiased.
One other important thing is that if we are working with the normal distribution then is the best unbiased estimator of in the sense that it has the smallest variance among all unbiased estimators and thus minimizes the MSE.
When normality is assumed, as is the case in many applications, is the natural choice when you want to estimate the variance.
The time-stepped difference is indeed used in one form, the Allan Variance. http://www.allanstime.com/AllanVariance/
Lots of good answers here, but I'll add a few.
Nonetheless, as @Pere said, your metric might prove itself very useful in the future.