手动计算的

我知道这是一个相当具体的R问题，但我可能正在考虑错误解释的比例方差。开始。 $R^2$

我正在尝试使用该R包装randomForest。我有一些训练数据和测试数据。当我拟合随机森林模型时，该randomForest函数允许您输入新的测试数据进行测试。然后，它告诉您此新数据中说明的方差百分比。当我看到这个时，我得到一个数字。

当我使用该predict()函数基于训练数据的模型拟合来预测测试数据的结果值时，并取这些值与测试数据的实际结果值之间的平方相关系数，得出一个不同的数字。这些值不匹配。

这是一些R代码来演示该问题。

# use the built in iris data
data(iris)

#load the randomForest library
library(randomForest)

# split the data into training and testing sets
index <- 1:nrow(iris)
trainindex <- sample(index, trunc(length(index)/2))
trainset <- iris[trainindex, ]
testset <- iris[-trainindex, ]

# fit a model to the training set (column 1, Sepal.Length, will be the outcome)
set.seed(42)
model <- randomForest(x=trainset[ ,-1],y=trainset[ ,1])

# predict values for the testing set (the first column is the outcome, leave it out)
predicted <- predict(model, testset[ ,-1])

# what's the squared correlation coefficient between predicted and actual values?
cor(predicted, testset[, 1])^2

# now, refit the model using built-in x.test and y.test
set.seed(42)
randomForest(x=trainset[ ,-1], y=trainset[ ,1], xtest=testset[ ,-1], ytest=testset[ ,1])

— 斯蒂芬·特纳
source

$R^2$ randomForest $R^2$

$R^2$

R^{2} = 1 - \frac{\sum_{i} (y_{i} - {\hat{y}}_{i})^{2}}{\sum_{i} (y_{i} - \bar{y})^{2}} .

$R^2 = 1 - \frac{\sum_i (y_i - \hat{y}_i)^2}{\sum_i (y_i - \bar{y})^2} .$

也就是说，我们计算均方误差，将其除以原始观测值的方差，然后从中减去。（请注意，如果您的预测确实很糟糕，则该值可能为负。）

$\hat{y}_i$ $\bar{y}$ $y - \hat{y}$ $\hat{y}$

R_{L R}^{2} = C o r r (y, \hat{y})^{2} .

$R^2_{\mathrm{LR}} = \mathrm{Corr}(y,\hat{y})^2 .$

L R

$\mathrm{LR}$

R_{L R}^{2}

$R^2_{\mathrm{LR}}$

该randomForest呼叫使用的是第一个定义，所以如果您这样做

   > y <- testset[,1]
   > 1 - sum((y-predicted)^2)/sum((y-mean(y))^2)

您会看到答案匹配。

— 红衣主教
source

R^{2}

$R^2$

（+1）确实非常优雅。

— chl

@ mpiktas，@ chl，今天晚些时候我将尝试进一步扩展。基本上，在后台与假设检验之间存在紧密的联系（但也许有些隐蔽）。即使在线性回归设置中，如果常数向量不在设计矩阵的列空间中，则“相关”定义也将失败。

— 主教

如果您有Seber / Lee教科书以外的其他参考资料（我无法访问），我希望对解释的差异（即1-SSerr / SStot）与平方相关系数或解释的方差有何不同做出很好的解释。再次感谢您的提示。

— 斯蒂芬·特纳

如果工具变量回归结果的R平方值为负，是否有办法抑制此负值并转换为正值以进行报告？请参考此链接：stata.com/support/faqs/statistics/two-stage-least-squares

— 埃里克