我有点困惑:经过插入符号训练的模型的结果与原始包装中的模型有什么不同?我阅读了使用带有插入符号包的RandomForest的FinalModel进行预测之前是否需要进行预处理?但我在这里不使用任何预处理。
我通过使用插入符号包并调整了不同的mtry值来训练了不同的随机森林。
> cvCtrl = trainControl(method = "repeatedcv",number = 10, repeats = 3, classProbs = TRUE, summaryFunction = twoClassSummary)
> newGrid = expand.grid(mtry = c(2,4,8,15))
> classifierRandomForest = train(case_success ~ ., data = train_data, trControl = cvCtrl, method = "rf", metric="ROC", tuneGrid = newGrid)
> curClassifier = classifierRandomForest
我发现mtry = 15是training_data上的最佳参数:
> curClassifier
...
Resampling results across tuning parameters:
mtry ROC Sens Spec ROC SD Sens SD Spec SD
4 0.950 0.768 0.957 0.00413 0.0170 0.00285
5 0.951 0.778 0.957 0.00364 0.0148 0.00306
8 0.953 0.792 0.956 0.00395 0.0152 0.00389
10 0.954 0.797 0.955 0.00384 0.0146 0.00369
15 0.956 0.803 0.951 0.00369 0.0155 0.00472
ROC was used to select the optimal model using the largest value.
The final value used for the model was mtry = 15.
我用ROC曲线和混淆矩阵评估了模型:
##ROC-Curve
predRoc = predict(curClassifier, test_data, type = "prob")
myroc = pROC::roc(test_data$case_success, as.vector(predRoc[,2]))
plot(myroc, print.thres = "best")
##adjust optimal cut-off threshold for class probabilities
threshold = coords(myroc,x="best",best.method = "closest.topleft")[[1]] #get optimal cutoff threshold
predCut = factor( ifelse(predRoc[, "Yes"] > threshold, "Yes", "No") )
##Confusion Matrix (Accuracy, Spec, Sens etc.)
curConfusionMatrix = confusionMatrix(predCut, test_data$case_success, positive = "Yes")
产生的混淆矩阵和准确性:
Confusion Matrix and Statistics
Reference
Prediction No Yes
No 2757 693
Yes 375 6684
Accuracy : 0.8984
....
现在,我使用基本的randomForest软件包训练了具有相同参数和相同training_data的Random Rorest:
randomForestManual <- randomForest(case_success ~ ., data=train_data, mtry = 15, ntree=500,keep.forest=TRUE)
curClassifier = randomForestManual
再次,我为与上述完全相同的test_data创建了预测,并使用与上述相同的代码评估了混淆矩阵。但是现在我有了不同的衡量标准:
Confusion Matrix and Statistics
Reference
Prediction No Yes
No 2702 897
Yes 430 6480
Accuracy : 0.8737
....
是什么原因?我想念什么?
seeds
trainControl