估计均匀分布的参数:不正确的先验?


10

我们有N个样本 X一世Xi,从均匀分布 [0θ][0,θ] 哪里 θθ未知。估计θθ 从数据。

因此,贝叶斯法则...

Fθ|X一世=FX一世|θFθFX一世f(θ|Xi)=f(Xi|θ)f(θ)f(Xi)

可能是:

FX一世|θ=ñ一世=1个1个θf(Xi|θ)=Ni=11θ (编辑:何时 0X一世θ0Xiθ 对所有人 一世i,否则为0-感谢whuber)

但是没有其他信息 θθ,似乎事前应该与 1个1 (即制服)或 1个大号1L (杰弗里斯事前?) [0][0,]但是然后我的积分不收敛,我不确定如何进行。有任何想法吗?


2
您的可能性是不正确的:每当该可能性为零时 θθ 小于最大 X一世Xi
ub

Can you show what integrals you are taking?

Yea, so, I guess I just don't know how to deal with the improper prior. E.g., I want to write f[Xi]=Θf(Xi|θ)f(θ)dθf[Xi]=Θf(Xi|θ)f(θ)dθ
Will

1
For the improper prior, f[Xi]=Θf(Xi|θ)f(θ)dθf[Xi]=Θf(Xi|θ)f(θ)dθ = max(Xi)θNdθmax(Xi)θNdθ = max(Xi)1N/(N1)max(Xi)1N/(N1) and for the prior f(θ)1/θf(θ)1/θ you similarly obtain max(Xi)N/N.max(Xi)N/N. Because maxXi>0maxXi>0 almost surely, it is certain the integrals will converge.
whuber

1
The Bernardo reference posterior is Pareto - see the catalog of noninformative priors.
Stéphane Laurent

Answers:


4

This has generated some interesting debate, but note that it really doesn't make much difference to the question of interest. Personally I think that because θθ is a scale parameter, the transformation group argument is appropriate, leading to a prior of

p(θ|I)=θ1log(UL)θ1L<θ<U

p(θ|I)=θ1log(UL)θ1L<θ<U

This distribution has the same form under rescaling of the problem (the likelihood also remains "invariant" under rescaling). The kernel of this prior, f(y)=y1f(y)=y1 can be derived by solving the functional equation af(ay)=f(y)af(ay)=f(y). The values L,UL,U depend on the problem, and really only matter if the sample size is very small (like 1 or 2). The posterior is a truncated pareto, given by:

p(θ|DI)=NθN1(L)NUNL<θ<UwhereL=max(L,X(N))

p(θ|DI)=NθN1(L)NUNL<θ<UwhereL=max(L,X(N))
Where X(N)X(N) is the Nth order statistic, or the maximum value of the sample. We get the posterior mean of E(θ|DI)=N((L)1NU1N)(N1)((L)NUN)=NN1L(1[LU]N11[LU]N)
E(θ|DI)=N((L)1NU1N)(N1)((L)NUN)=NN1L1[LU]N11[LU]N
If we set UU and L0L0 the we get the simpler exression E(θ|DI)=NN1X(N)E(θ|DI)=NN1X(N).

But now suppose we use a more general prior, given by p(θ|cI)θc1p(θ|cI)θc1 (note that we keep the limits L,UL,U to ensure everything is proper - no singular maths then). The posterior is then the same as above, but with NN replaced by c+Nc+N - provided that c+N0c+N0. Repeating the above calculations, we the simplified posterior mean of

E(θ|DI)=N+cN+c1X(N)

E(θ|DI)=N+cN+c1X(N)

So the uniform prior (c=1c=1) will give an estimate of N1N2X(N)N1N2X(N) provided that N2N2 (mean is infinite for N=2N=2). This shows that the debate here is a bit like whether or not to use NN or N1N1 as the divisor in the variance estimate.

One argument against the use of the improper uniform prior in this case is that the posterior is improper when N=1N=1, as it is proportional to θ1θ1. But this only matters if N=1N=1 or is very small.


1

Since the purpose here is presumably to obtain some valid and useful estimate of θθ, the prior distribution should be consistent with the specification of the distribution of the population from which the sample comes. This does NOT in any way mean that we "calculate" the prior using the sample itself -this would nullify the validity of the whole procedure. We do know that the population from which the sample comes is a population of i.i.d. uniform random variables each ranging in [0,θ][0,θ]. This is a maintained assumption and is part of the prior information that we possess (and it has nothing to do with the sample, i.e. with a specific realization of a subset of these random variables).

Now assume that this population consists of mm random variables, (while our sample consists of n<mn<m realizations of nn random variables). The maintained assumption tells us that maxi=1,...,n{Xi}maxj=1,...,m{Xj}θ

maxi=1,...,n{Xi}maxj=1,...,m{Xj}θ

Denote for compactness maxi=1,...,n{Xi}Xmaxi=1,...,n{Xi}X. Then we have θXθX which can also be written θ=cXc1

θ=cXc1

The density function of the maxmax of NN i.i.d Uniform r.v.'s ranging in [0,θ][0,θ] is fX(x)=N(x)N1θN

fX(x)=N(x)N1θN

for the support [0,θ][0,θ], and zero elsewhere. Then by using θ=cXθ=cX and applying the change-of-variable formula we obtain a prior distribution for θθ that is consistent with the maintained assumption: fp(θ)=N(θc)N1θN1c=NcNθ1θ[x,]

fp(θ)=N(θc)N1θN1c=NcNθ1θ[x,]

which may be improper if we don't specify the constant cc suitably. But our interest lies in having a proper posterior for θθ, and also, we do not want to restrict the possible values of θθ (beyond the restriction implied by the maintained assumption). So we leave cc undetermined.
Then writing X={x1,..,xn}X={x1,..,xn} the posterior is

f(θX)θNNcNθ1f(θX)=ANcNθ(N+1)

f(θX)θNNcNθ1f(θX)=ANcNθ(N+1)

for some normalizing constant A. We want Sθf(θX)dθ=1xANcNθ(N+1)dθ=1

Sθf(θX)dθ=1xANcNθ(N+1)dθ=1

ANcN1NθN|x=1A=(cx)N

ANcN1NθNx=1A=(cx)N

Inserting into the posterior f(θX)=(cx)NNcNθ(N+1)=N(x)Nθ(N+1)

f(θX)=(cx)NNcNθ(N+1)=N(x)Nθ(N+1)

Note that the undetermined constant cc of the prior distribution has conveniently cancelled out.

The posterior summarizes all the information that the specific sample can give us regarding the value of θθ. If we want to obtain a specific value for θθ we can easily calculate the expected value of the posterior, E(θX)=xθN(x)Nθ(N+1)dθ=NN1(x)NθN+1|x=NN1x

E(θX)=xθN(x)Nθ(N+1)dθ=NN1(x)NθN+1x=NN1x

Is there any intuition in this result? Well, as the number of XX's increases, the more likely is that the maximum realization among them will be closer and closer to their upper bound, θθ - which is exactly what the posterior mean value of θθ reflects: if, say, N=2E(θX)=2xN=2E(θX)=2x, but if N=10E(θX)=109xN=10E(θX)=109x. This shows that our tactic regarding the selection of the prior was reasonable and consistent with the problem at hand, but not necessarily "optimal" in some sense.


1
Basing the prior on the data sounds fishy to me. How do you justify this approach?
whuber

2
I have nothing against the fact that your prior is not "the best". Where did I say something like that ? I'm just trying to understand your approach. I don't understand this equality yet. If cc is constant in the equality θ=cXθ=cX, does that mean that both XX and θθ are nonrandom ? By the way you don't use the fact that c1c1 in the derivation of the prior, do you ? (cc @whuber)
Stéphane Laurent

1
And the support of your prior depends on the data ? (θ[x,[θ[x,[)
Stéphane Laurent

3
A prior depending (even if this is only through the support) on the data sounds wrong: you cannot know the maximum of the sample before the sample has been generated. Moreover, you claim that θ=cXθ=cX is an almost sure equality, with both θθ and XX random (thus there is correlation 11). But this implies that the posterior distribution of θθ (which is the conditional distribution of θθ given the sample) is the Dirac mass at cxcx. And this contradicts your derivation of the posterior distribution. ... (no characters left...)
Stéphane Laurent

1
The posterior distribution of θθ is Dirac at cxcx means that θθ is cxcx. Bayes theorem is not the cause. You destroy everything by assuming θ=cXθ=cX. This implies X=θ/cX=θ/c, thus the conditional distribution of XX given θθ is the Dirac mass at θ/cθ/c, whereas the original assumption is that this distribution is the uniform distribution on (0,θ)(0,θ).
Stéphane Laurent

0

Uniform Prior Distribution Theorem (interval case):

"If the totality of Your information about θ external to the data D is captured by the single proposition B={{Possible values for θ}={the interval (a,b)},a<b}

then Your only possible logically-internally-consistent prior specification is f(θ)=Uniform(a,b)

Thus, you prior specification should correspond to the Jeffrey's prior if you truly believe in the above theorem."

Not part of the Uniform Prior Distribution Theorem:

Alternatively you could specify your prior distribution f(θ) as a Pareto distribution, which is the conjugate distribution for the uniform, knowing that you posterior distribution will have to be another uniform distribution by conjugacy. However, if you use the Pareto distribution, then you will need to specify parameters of the Pareto distribution in some sort of way.


4
First you say the "only possible logically internally consistent" answer is a uniform distribution and then you proceed to propose an alternative. That sounds illogical and inconsistent to me :-).
whuber

2
I can't agree. For instance, B is also the set {θ|θ3(a3,b3)}. When ΘUniform(a,b), the PDF of Ψ=Θ3 is 1/(3ψ2/3(ba)) for a3<ψ<b3. But according to the "theorem," ΨUniform(a3,b3) whose pdf is 1/(b3a3) in that interval. In short, although the proposition does not depend on how the problem is parameterized, the "theorem"'s conclusion does depend on the parameterization, whence it is ambiguous.
whuber

2
BabakP: How could one say this is a theorem ? A theorem is a mathematical claim with a mathematical proof. This "theorem" would be more appropriately termed as a "principle", but it is not sensible because it is contradictory, as shown by @whuber.
Stéphane Laurent

2
Thanks for the reference BabakP. I would like to point out that the "proof sketch" is bogus. Draper divides the interval into a finite number of equally spaced values and "passes to the limit." Anybody can divide the interval into values spaced to approximate any density they like and similarly pass to the limit, producing perfectly arbitrary "only possible logically-internally-consistent prior specifications." This kind of stuff--namely, using bad mathematics in an effort to show that non-Bayesians are illogical--gives Bayesian analysis an (undeservedly) bad name. (cc @Stéphane.)
whuber

1
@Stéphane Please forgive my insensitivity (insensibilité)--I admire your skill at interacting here in a second language and do not knowingly use obscure terms! Bogus is an adjective that comes from a 200-year old US slang term referring to a machine for counterfeiting money. In this case it's a mathematical machine for counterfeiting theorems :-).
whuber
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.