R中的Dunnett检验每次都返回不同的值

13

我正在使用R'multcomp'库（http://cran.r-project.org/web/packages/multcomp/）计算Dunnett的测试。我正在使用以下脚本：

Group <- factor(c("A","A","B","B","B","C","C","C","D","D","D","E","E","F","F","F"))
Value <- c(5,5.09901951359278,4.69041575982343,4.58257569495584,4.79583152331272,5,5.09901951359278,4.24264068711928,5.09901951359278,5.19615242270663,4.58257569495584,6.16441400296898,6.85565460040104,7.68114574786861,7.07106781186548,6.48074069840786)
data <- data.frame(Group, Value)
aov <- aov(Value ~ Group, data)
summary(glht(aov, linfct=mcp(Group="Dunnett")))

现在，如果我多次在R Console中运行此脚本，则每次得到的结果都会略有不同。这是一个例子：

         Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Dunnett Contrasts


Fit: aov(formula = Value ~ Group, data = data)

Linear Hypotheses:
           Estimate Std. Error t value Pr(>|t|)   
B - A == 0 -0.35990    0.37009  -0.972  0.76545   
C - A == 0 -0.26896    0.37009  -0.727  0.90019   
D - A == 0 -0.09026    0.37009  -0.244  0.99894   
E - A == 0  1.46052    0.40541   3.603  0.01710 * 
F - A == 0  2.02814    0.37009   5.480  0.00104 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

这是另一个：

         Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Dunnett Contrasts


Fit: aov(formula = Value ~ Group, data = data)

Linear Hypotheses:
           Estimate Std. Error t value Pr(>|t|)    
B - A == 0 -0.35990    0.37009  -0.972   0.7654    
C - A == 0 -0.26896    0.37009  -0.727   0.9001    
D - A == 0 -0.09026    0.37009  -0.244   0.9989    
E - A == 0  1.46052    0.40541   3.603   0.0173 *  
F - A == 0  2.02814    0.37009   5.480   <0.001 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

如您所见，以上两个结果略有不同，但是足以将最后一组（F）从两颗星移到三颗星，这让我感到担忧。

我对此有几个问题：

为什么会这样？当然，如果您每次都输入相同的数据，则应该得到相同的数据。
Dunnett的计算中是否使用了某种随机数？
每次这种微小的变化实际上是一个问题吗？

r multiple-comparisons

— 用户名
source

7

我通过示例一起回答了您的前两个问题。

library(multcomp)

Group <- factor(c("A","A","B","B","B","C","C","C","D","D","D","E","E","F","F","F"))
Value <- c(5,5.09901951359278,4.69041575982343,4.58257569495584,4.79583152331272,5,5.09901951359278,4.24264068711928,5.09901951359278,5.19615242270663,4.58257569495584,6.16441400296898,6.85565460040104,7.68114574786861,7.07106781186548,6.48074069840786)
data <- data.frame(Group, Value)

fit <- aov(Value ~ Group, data)

set.seed(20140123)
Dunnet <- glht(fit, linfct=mcp(Group="Dunnett"))
summary(Dunnet)

结果：

     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Dunnett Contrasts


Fit: aov(formula = Value ~ Group, data = data)

Linear Hypotheses:
           Estimate Std. Error t value Pr(>|t|)   
B - A == 0 -0.35990    0.37009  -0.972  0.76536   
C - A == 0 -0.26896    0.37009  -0.727  0.90012   
D - A == 0 -0.09026    0.37009  -0.244  0.99895   
E - A == 0  1.46052    0.40541   3.603  0.01794 * 
F - A == 0  2.02814    0.37009   5.480  0.00112 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

再次运行（不设置种子）：

summary(Dunnet)

不同的结果：

     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Dunnett Contrasts


Fit: aov(formula = Value ~ Group, data = data)

Linear Hypotheses:
           Estimate Std. Error t value Pr(>|t|)   
B - A == 0 -0.35990    0.37009  -0.972  0.76535   
C - A == 0 -0.26896    0.37009  -0.727  0.90020   
D - A == 0 -0.09026    0.37009  -0.244  0.99895   
E - A == 0  1.46052    0.40541   3.603  0.01767 * 
F - A == 0  2.02814    0.37009   5.480  0.00105 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

再次运行（设置种子）：

set.seed(20140123)
Dunnet <- glht(fit, linfct=mcp(Group="Dunnett"))
summary(Dunnet)

结果相同：

     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Dunnett Contrasts


Fit: aov(formula = Value ~ Group, data = data)

Linear Hypotheses:
           Estimate Std. Error t value Pr(>|t|)   
B - A == 0 -0.35990    0.37009  -0.972  0.76536   
C - A == 0 -0.26896    0.37009  -0.727  0.90012   
D - A == 0 -0.09026    0.37009  -0.244  0.99895   
E - A == 0  1.46052    0.40541   3.603  0.01794 * 
F - A == 0  2.02814    0.37009   5.480  0.00112 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

通过在每次运行之前设置种子，您可以获得一致的结果。因此，似乎在计算p值时使用了随机数。

我认为这种微小的变化是否有问题？我不太喜欢它，但我会忍受。使用设定的种子将使结果可重复。我建议不要考虑旁边的恒星是p值，而是选择对您有意义且有用的。除非我正在从事的项目确实要求达到这样的精度水平，否则我尽量不要被5或6个小数点所困扰。在这种情况下，我认为大多数人都会同意，即使p值计算略有变化，结果的解释也将是相同的。 $alpha$

— 埃利斯·瓦伦特（Ellis Valentiner）
source

非常感谢您的回答。我认为您没有考虑那里有多少颗星是正确的-无论如何，人们应该关注P值。我想我必须将种子设置为一个已知值，因为为了验证我的程序，结果必须精确可复制。只是一个问题-您知道为什么要使用随机种子吗？

— user1578653 2014年

1

请参阅@Aniko编写的答案，其中提供了更详细的说明。注意，我使用今天的日期作为种子。

— Ellis Valentiner

10

您是正确的，其中涉及一个随机数生成，它使计算因运行而异。罪魁祸首实际上不是邓尼特的过程，而是单步调整所需的多元t分布。

以下代码显示了使用具有可交换相关性的多元分布的5维向量计算的示例： $P(X<0)$ $X$ $T_5$

> library(mvtnorm)
> cr2 <- matrix(rep(0.3, 25), nr=5); diag(cr2) <- 1
> cr2
     [,1] [,2] [,3] [,4] [,5]
[1,]  1.0  0.3  0.3  0.3  0.3
[2,]  0.3  1.0  0.3  0.3  0.3
[3,]  0.3  0.3  1.0  0.3  0.3
[4,]  0.3  0.3  0.3  1.0  0.3
[5,]  0.3  0.3  0.3  0.3  1.0
> b <- pmvt(lower=rep(-Inf,5), upper=rep(0,5), delta=rep(0,5), df=5, corr=cr2)
> a <- pmvt(lower=rep(-Inf,5), upper=rep(0,5), delta=rep(0,5), df=5, corr=cr2)
> all.equal(a,b)
[1] "Attributes: < Component 1: Mean relative difference: 0.1527122 >"
[2] "Mean relative difference: 0.0003698006"

如果这是令人担心的，则只需set.seed在计算之前使用任何参数进行调用即可使其完全可重复。

顺便说一句，在以下输出中对错误的确认和量化glht：

> ss <- summary(glht(aov, linfct=mcp(Group="Dunnett")))
> attr(ss$test$pvalues, "error")
[1] 0.0006597562

— 安妮子
source