该沃尔德沃尔福威茨运行测试似乎是一个可能的候选人,其中“运行”是你叫“连胜”。它需要二元数据,因此您必须根据某个阈值将每个解决方案分别标记为“不良”与“良好”,例如您建议的中位时间。零假设是“好”和“坏”随机交替求解。与您的直觉相对应的单方面替代假设是,“好”可以长期解决结团问题,这意味着运行次数少于随机数据的预期。测试统计信息是运行次数。在R中:
> N <- 200 # number of solves
> DV <- round(runif(N, 15, 30), 1) # simulate some uniform data
> thresh <- median(DV) # threshold for binary classification
# do the binary classification
> DVfac <- cut(DV, breaks=c(-Inf, thresh, Inf), labels=c("good", "bad"))
> Nj <- table(DVfac) # number of "good" and "bad" solves
> n1 <- Nj[1] # number of "good" solves
> n2 <- Nj[2] # number of "bad" solves
> (runs <- rle(as.character(DVfac))) # analysis of runs
Run Length Encoding
lengths: int [1:92] 2 1 2 4 1 4 3 4 2 5 ...
values : chr [1:92] "bad" "good" "bad" "good" "bad" "good" "bad" ...
> (nRuns <- length(runs$lengths)) # test statistic: observed number of runs
[1] 92
# theoretical maximum of runs for given n1, n2
> (rMax <- ifelse(n1 == n2, N, 2*min(n1, n2) + 1))
199
当您只有几个观察值时,您可以在原假设下计算每个运行次数的确切概率。否则,“运行次数”的分布可以通过标准正态分布来近似。
> (muR <- 1 + ((2*n1*n2) / N)) # expected value
100.99
> varR <- (2*n1*n2*(2*n1*n2 - N)) / (N^2 * (N-1)) # theoretical variance
> rZ <- (nRuns-muR) / sqrt(varR) # z-score
> (pVal <- pnorm(rZ, mean=0, sd=1)) # one-sided p-value
0.1012055
p值用于“好”解决的单方面假设。