中位数无偏估计量是否会使平均绝对偏差最小化?


14

这是一个后续的也是不同的问题,我以前的一个

我在Wikipedia上读到,“ 拉普拉斯(Laplace)观察到,中值无偏估计器使绝对偏差损失函数的风险最小化。” 但是,我的蒙特卡洛模拟结果不支持该论点。

我假定从对数正常人群中,样品,其中,μσ是对数平均和对数标准差,β = EXP μ = 50X1,X2,...,XNLN(μ,σ2)μσβ=exp(μ)=50

几何平均估计量是总体中值的中值无偏估计量,exp(μ)

,其中,μσ是对数平均和对数标准差μ σ是极大似然估计μσβ^GM=exp(μ^)=exp(log(Xi)N)LN(μ,σ2/N)μσμ^σ^μσ

校正后的几何平均估计量是总体中位数的均值无偏估计量。

β^CG=exp(μ^σ^2/2N)

我从LN log 50 重复生成大小为5的样本。复制号是10,000。对于几何均值估计器,我得到的平均绝对偏差为25.14,对于校正后的几何均值,则为22.92。为什么?(log(50),log(1+22))

顺便说一句,几何平均值的估计中值绝对偏差为18.18,校正几何平均值估计器为18.58。

我使用的R脚本在这里:

#```{r stackexchange}
#' Calculate the geomean to estimate the lognormal median.
#'
#' This function Calculate the geomean to estimate the lognormal
#' median.
#'
#' @param x a vector.
require(plyr)
GM <- function(x){
    exp(mean(log(x)))
}
#' Calculate the bias corrected geomean to estimate the lognormal
#' median.
#'
#' This function Calculate the bias corrected geomean using the
#' variance of the log of the samples, i.e., $\hat\sigma^2=1/(n-1)
# \Sigma_i(\Log(X_i)-\hat\mu)^2$
#'
#' @param x a vector.
BCGM <- function(x){
y <- log(x)
exp(mean(y)-var(y)/(2*length(y)))
}
#' Calculate the bias corrected geomean to estimate the lognormal
#' median.
#'
#' This function Calculate the bias corrected geomean using
#' $\hat\sigma^2=1/(n)\Sigma_i(\Log(X_i)-\hat\mu)^2$
#'
#' @param x a vector.
CG <- function(x){
y <- log(x)
exp(mean(y)-var(y)/(2*length(y))*(length(y)-1)/length(y))
}

############################

simln <- function(n,mu,sigma,CI=FALSE)
{
    X <- rlnorm(n,mu,sigma)
    Y <- 1/X
    gm <- GM(X)
    cg <- CG(X)
    ##gmk <- log(2)/GM(log(2)*Y) #the same as GM(X)
    ##cgk <- log(2)/CG(log(2)*Y)
    cgk <- 1/CG(Y)
    sm <- median(X)
    if(CI==TRUE) ci <- calCI(X)
    ##bcgm <- BCGM(X)
    ##return(c(gm,cg,bcgm))
    if(CI==FALSE) return(c(GM=gm,CG=cg,CGK=cgk,SM=sm)) else return(c(GM=gm,CG=cg,CGK=cgk,CI=ci[3],SM=sm))
}
cv <-2
mcN <-10000
res <- sapply(1:mcN,function(i){simln(n=5,mu=log(50),sigma=sqrt(log(1+cv^2)), CI=FALSE)})
sumres.mad <- apply(res,1,function(x) mean(abs(x-50)))
sumres.medad <- apply(res,1,function(x) median(abs(x-50)))
sumres.mse <- apply(res,1,function(x) mean((x-50)^2))
#```

#```{r eval=FALSE}
#> sumres.mad
      GM       CG      CGK       SM 
#25.14202 22.91564 29.65724 31.49275 
#> sumres.mse
      GM       CG      CGK       SM 
#1368.209 1031.478 2051.540 2407.218 
#```

1
1.)“ 10,000”对于您的问题来说太小了,请尝试“ 250,000”(或更多)。2)如果运行蒙特卡洛模拟并得到看似奇怪的结果,请尝试使用更改种子set.seed。3)不要总是相信维基百科-注意你引用的文本(从“中间”一文),与此不同如何其他维基百科的文章 4)你的[R代码是一团糟-查看谷歌的[R风格指南的一些好的风格指南。
2014年

Answers:


4

α+α

Ë= <|α+-α|> =-α+α+-αFαdα+α+α-α+Fαdα

我们需要

dËdα+=-α+Fαdα-α+Fαdα=0

相当于 Pα>α+=1个/2。所以α+ 显示为1774年拉普拉斯之后的中位数。

如果您在使用R时遇到问题,请在Stack Overflow的另一个问题中提问


Theoretically, I think it is correct. However, I am confused by the R simulation results which does not back up this statement as expected.
Zhenglei

2
I am a Data Scientist/Physicist so have never seen a line of R. As I suggested in the question, if it is a code issue you should ask it in Stack Overflow and you will get much more attention. However, the above answer is correct unless you would like to elaborate on how it generalizes to a median-unbiased estimator. For more details see page 172 of E.T. Jaynes book Probability theory ISBN 978-0-521-59271-0.
Keith

Thank you a lot for your answer. It is not a coding issue. I just want to do simulations to show that a median-unbiased estimator will minimize the expected absolute deviation. I haven't accepted the answer because I am mainly confused about the simulation step. I implemented it in R but simulations could be done in Matlab or Python or any other languages.
Zhenglei

2
I suspect the issue is that you are dealing with an approximation which works as N -> 但是您有10,000和5,都是小数字。也许您最好问三个问题。为什么它在理论上是正确的,当N实际上足够大时,以及您的R代码是否有问题。我回答了第一个,第二个在很大程度上是计算性的,但是对于这种特定情况可能有一个很好的经验法则,第三个属于堆栈溢出。
基思(Keith)2014年

@基思(Keith)对我的数学很差感到抱歉,但是您能否显示更多有关如何得出期望的详细信息?
AdamO '17
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.