标准化和学生化之间有什么区别？

21

是否在标准化中知道方差，而在学生化中却不知道并据此估算？谢谢。

standardization

— 58485362
source

2

您可能需要澄清问题的内容。什么样的标准化，什么样的学生化？这些值是用来做什么的？

— russellpierce 2014年

3

如果您要询问残差，则该术语不是标准化的。不同的作者对同一事物使用不同的名称，有时-令人遗憾的是，最令人困惑的是，对不同的事物使用相同的名称。还有就是我所说的（我）缩放残差（

，称为标准化有些作者残渣）; （ii）内部学生化的残差（被某些作者/软件包称为标准化，而被其他学生 /软件包标准化）；（iii）外部学生化 / 学生化删除

(y - {\hat{y}}_{i}) / s

$(y-\hat{y}_i)/s$ 残差

— Glen_b-恢复莫妮卡2014年

20

简短回顾。给定一个模型，其中，是，和，其中 $y=X\beta+\varepsilon$ $X$ $n\times p$ $\hat\beta=(X'X)^{-1}X'y$ $\hat y=X\hat\beta=X(X'X)^{-1}X'y=Hy$ 是“帽子矩阵”。残差是总体方差是未知的，并且可以估算，均方误差。 $H=X(X'X)^{-1}X'$

Ë = ÿ - \hat{ÿ} = ÿ - H ÿ = （ 一世 - H ） ÿ

$e=y-\hat y=y-Hy=(I-H)y$

σ^{2}

$\sigma^2$

M S E

$MSE$

半学习残差定义为但是，由于残差的方差取决于两个和，它们的估计方差是其中是个对角元素帽子矩阵。

Ë_{一世}^{*} = \frac{Ë_{一世}}{\sqrt{中号 小号 Ë}}

$e_i^*=\frac{e_i}{\sqrt{MSE}}$

σ^{2}

$\sigma^2$

X

$X$

\hat{V} （ Ë_{一世} ） = 中号 小号 Ë （ 1个 - H_{一世 一世} ）

$\widehat V(e_i)=MSE(1-h_{ii})$

h_{i i}

$h_{ii}$

i

$i$

标准化残差，也称为内部学生残差，为：

r_{i} = \frac{e_{i}}{\sqrt{M S E (1 - h_{i i})}}

$r_i=\frac{e_i}{\sqrt{MSE(1-h_{ii})}}$

但是，单个和是非独立的，因此不能具有分布。该过程然后删除个观察，拟合回归线功能向剩余观察，并得到新的其可以通过被表示的。差：被称为 $e_i$ $MSE$ $r_i$ $t$ $i$ $n-1$ $\hat y$ $\hat y_{i(i)}$

d_{i} = y_{i} - {\hat{y}}_{i (i)}

$d_i=y_i-\hat y_{i(i)}$

d_{i} = \frac{e_{i}}{1 - h_{i i}}

$d_i=\frac{e_i}{1-h_{ii}}$

X

$X$

M S E

$MSE$

X_{(i)}

$X_{(i)}$

M S E_{(i)}

$MSE_{(i)}$

i

$i$

t_{i} = \frac{d_{i}}{\sqrt{\frac{M S E_{(i)}}{1 - h_{i i}}}} = \frac{e_{i}}{\sqrt{M S E_{(i)} (1 - h_{i i})}} \sim t_{n - p - 1}

$t_i=\frac{d_i}{\sqrt{\frac{MSE_{(i)}}{1-h_{ii}}}} =\frac{e_i}{\sqrt{MSE_{(i)}(1-h_{ii})}}\sim t_{n-p-1}$ The

t_{i}

$t_i$ 's are called studentized (deleted) residuals, or externally studentized residuals.

See Kutner et al., Applied Linear Statistical Models, Chapter 10.

Edit: I must say that the answer by rpierce is perfect. I thought that the OP was about standardized and studentized residuals (and dividing by the population standard deviation to get standardized residuals looked odd to me, of course), but I was wrong. I hope that my answer can help someone even if OT.

— 塞尔吉奥
source

2

...并且此答案在根据回归方程定义学生化残差时是正确的。没有相应标准化残差的定义。回归框架似乎不适用于所提出的问题。但这仍然是一个宝贵的贡献。+1

— 拉塞尔皮尔斯（Russellpierce）2014年

2

@rpierce，您是对的：一读“学习”一书，我也读“残余”，但它们只是在我脑海中；-)对不起。我只有在最后一次点击后才注意到我的疏忽。

— 塞尔吉奥

9

In social sciences it is typically said that Studentizated scores uses Student's/Gosset's calculation for estimating the population variance/standard deviation from the sample variance/standard deviation ( $s$ ). In contrast, Standardized scores (a noun, a particular type of statistic, the Z score) are said to use the population standard deviation ?( $\sigma$ ).

However, it appears there is some terminological differences across fields (please see the comments on this answer). Therefore, one ought to proceed with caution in making these distinctions. Moreover, studentized scores are rarely called such and one typically sees 'studentized' values in the context of regression. @Sergio provides details about those types of studentized deleted residuals in his answer.

— russellpierce
source

2

Wikipedia adds, "The term is also used for the standardisation of a higher-degree statistic by another statistic of the same degree: for example, an estimate of the third central moment would be standardised by dividing by the cube of the sample standard deviation."

— Nick Stauner

2

I think it would be safer to say that Studentization is the form of standardization available if the population variance is unknown. This takes the form of a technical, terminological point of distinction rather than a misleading statement about the more general, broadly-used term.

— Nick Stauner

2

@whuber: The context of the question was basic, so I gave a basic answer. Standard scores (Z) are computed in introductory stats and

σ

$\sigma$ is given to them. Sometimes you do actually have the population standard deviation (e.g. a non-missing data census of 10 people).

— russellpierce

2

@Nick That sounds like a good resolution, given that various authorities do use "standardization" broadly but none (AFAIK) ever use "studentize" in such a broad sense.

— whuber

2

@rpierce第二本书（弗里德曼（Freedman），皮萨尼（Pisani）和普尔维斯（Purves））已经发行了大约40年，共发行了5个版本（大部分没有变化），并开始作为加州大学伯克利分校的入门统计课程的文字。它涵盖了几乎所有可能的领域，而不仅仅是公共卫生。另一方面，其优势之一是避免强调过小的，无意义的或过分的技术区别，因此，尽管它通常是统计的良好指南，但不能依靠它来解决不可思议的问题。

— ub

3

我很迟才回答这个问题！但是找不到非常简单的语言的答案，所以请谦虚地尝试回答。

我们为什么要标准化？假设您有两个模型，一个模型通过研究统计数据所花费的时间来预测疯狂，而另一个模型则通过统计数据所花费的时间来预测对数（疯狂）。

很难理解残差都在不同的单位中。因此我们将它们标准化。（与Z分数类似的理论）

标准化残差：-将残差除以标准差的估计值。通常，如果绝对值> 3，则值得关注。

我们用它来研究模型中的异常值。

学生化残差：我们用它来研究模型的稳定性。

过程很简单。我们从模型中删除单个测试用例，并找出新的预测值。可以通过除以标准误差来标准化新值与原始观测值之间的差异。此值为学生剩余数

有关使用R发现静态的更多信息-http: //www.statisticshell.com/html/dsur.html

— 霍亚尔
source

1

Wikipedia在https://en.wikipedia.org/wiki/Normalization_(statistics）上有很好的概述：

标准分数 $\frac{X - \mu}{\sigma}$ ：在已知填充参数时归一化错误。适用于正态分布的人群

学生的t统计量 $\frac{X - \overline{X}}{s}$ ：在总体参数未知（估计）时归一化残差。

— 阿斯迈尔
source