Answers:
许多频繁性置信区间(CI)都基于似然函数。如果先验分布确实是非信息性的,则贝叶斯后验具有与似然函数基本相同的信息。因此,在实践中,贝叶斯概率区间(或可信区间)在数值上可能与频密者置信区间非常相似。[当然,即使在数值上相似,在频繁度估计和贝叶斯区间估计之间的解释也存在哲学差异。]
这是一个简单的示例,估计二项式成功概率 假设我们有观察(试验),且成功。
频率论:传统沃尔德间隔用途点估计 θ = X / Ñ = 73 / 100 = 0.73。和95%CI的形式为 θ ± 1.96 √
n = 100; x = 73; th.w = x/n; pm = c(-1,1)
ci.w = th.w + pm*1.96*sqrt(th.w*(1-th.w)/n); ci.w
[1] 0.6429839 0.8170161
This form of CI assumes that relevant binomial distributions can be approximated by normal ones and that the margin of error is well approximated by Particularly for small these assumptions need not be true. [The cases where or are especially problematic.]
where Then a 95% CI is of the form
ci.a = th.a + pm*1.96*sqrt(th.a*(1-th.a)/n); ci.a
[1] 0.6122700 0.7915761
The likelihood function is proportional to Multiplying the kernels of the prior and likelihood we have the kernel of the posterior distribution
Then a 95% Bayesian interval estimate uses quantiles 0.025 and 0.975 of the posterior distribution to get When the prior distribution is 'flat' or 'noninformative' the numerical difference between the Bayesian probability interval and the Agresti-Coull confidence interval is slight.
qbeta(c(.025, .975), 74, 28)
[1] 0.6353758 0.8072313
Notes: (a) In this situation, some Bayesians prefer the noninformative prior (b) For confidence levels other than 95%, the Agresti-Coull CI uses a slightly different point estimate. (c) For data other than binomial, there may be no available 'flat' prior, but one can choose a prior with a huge variance (small precision) that carries very little information. (d) For more discussion of Agresti-Coull CIs, graphs of coverage probabilities, and some references, perhaps also see this Q & A.
BruceET's answer is excellent but pretty long, so here's a quick practical summary:
While one can solve for a prior that yields a credible interval that equals the frequentist confidence interval, it is important to realize how narrow the scope of application is. The entire discussion is assuming that the sample size was fixed and is not a random variable. It assumes that there was only one look at the data, and that sequential inference was not done. It assumes there was only one dependent variable and no other parameters were of interest. Where there are multiplicities, the Bayesian and frequentist intervals diverge (Bayesian posterior probabilities are in forward-time predictive mode and don't need to consider "how we got here", thus have no way or need to adjust for multiple looks). In addition, in the frequentist world the interpretation of confidence intervals is extremely strange and has confused many a student and caused some frequentist statisticians to become Bayesian.
The likelihood function, and associated the confidence interval, are not the same (concept) as a Bayesian posterior probability constructed with a prior that specifies a uniform distribution.
In part 1 and 2 of this answer it is argued why likelihood should not be viewed as a Bayesian posterior probability based on a flat prior.
In part 3 an example is given where the confidence interval and credible interval are widely varying. Also it is pointed out how this discrepancy arises.
Probabilities transform in a particular way. If we know the probability distribution distribution then we also know the distribution of for the variable defined by any function , according to the transformation rule:
If you transform a variable then the mean and the mode may vary due to this change of the distribution function. That means and .
The likelihood function does not transform in this way. This is the contrasts between the likelihood function and the posterior probability. The (maximum of the) likelihood function remains the same when you transform the variable.
Related:
The flat prior is ambiguous. It depends on the form of the particular statistic.
For instance, if is uniform distributed (e.g. , then is not a uniform distributed variable.
There is no single flat prior that you can relate the Likelihood function to. It is different when you define the flat prior for or some transformed variable like . For the likelihood this dependency does not exist.
The boundaries of probabilities (credibility intervals) will be different when you transform the variable, (for likelihood functions this is not the case). E.g for some parameter and a monotonic transformation (e.g. logarithm) you get the equivalent likelihood intervals
Suppose you sample a variable from a population with (unknown) parameter which itself (the population with parameter ) is sampled from a super-population (with possibly varying values for ).
One can make an inverse statement trying to infer what the original may have been based on observing some values for the variable .
The confidence interval does not use information of a prior like the credible interval does (confidence is not a probability).
Regardless of the prior distribution (uniform or not) the x%-confidence interval will contain the true parameter in of the cases (confidence intervals refer to the success rate, type I error, of the method, not of a particular case).
In the case of the credible interval this concept ( of time that the interval contains the true parameter) is not even applicable, but we may interpret it in a frequentist sense and then we observe that the credible interval will contain the true parameter only of the time when the (uniform) prior is correctly describing the super-population of parameters that we may encounter. The interval may effectively be performing higher or lower than the x% (not that this matters since the Bayesian approach answers different questions, but it is just to note the difference).
In the example below we examine the likelihood function for the exponential distribution as function of the rate parameter , the sample mean , and sample size :
this functions expresses the probability to observe (for a given and ) a sample mean between and .
note: the rate parameter goes from to (unlike the OP 'request' from to ). The prior in this case will be an improper prior. The principles however does not change. I am using this perspective for easier illustration. Distributions with parameters between and are often discrete distributions (difficult to drawing continuous lines) or a beta distribution (difficult to calculate)
The image below illustrates this likelihood function (the blue colored map), for sample size , and also draws the boundaries for the 95% intervals (both confidence and credible).
The boundaries are created obtaining the (one-dimensional) cumulative distribution function. But, this integration/cumulation can be done in two directions.
The difference between the intervals occurs because the 5% area's are made in different ways.
The 95% confidence interval contains values for which the observed value would occur at least in 95% of the cases. In this way. whatever the value , we would only make a wrong judgement in 95% of the cases.
For any you have north and south of the boundaries (changing ) 2.5% of the weight of the likelihood function.
The 95% credible interval contains values which are most likely to cause the observed value (given a flat prior).
Even when the observed result is less than 5% likely for a given , the particular may be inside the credible interval. In the particular example higher values of are 'preferred' for the credible interval.
For any you have west and east of the boundaries (changing ) 2.5% of the weight of the likelihood function.
A case where confidence interval and credible interval (based on improper prior) coincide is for estimating the mean of a Gaussian distributed variable (the distribution is illustrated here: https://stats.stackexchange.com/a/351333/164061 ).
An obvious case where confidence interval and credible interval do not coincide is illustrated here (https://stats.stackexchange.com/a/369909/164061). The confidence interval for this case may have one or even both of the (upper/lower) bounds at infinity.
This is not generally true, but it may seem so because of the most frequently considered special cases.
Consider The interval is a confidence interval for albeit not one that anyone with any common sense would use. It does not coincide with a credible interval from the posterior from a flat prior.
Fisher's technique of conditioning on an ancillary statistic does in this case yield a confidence interval that coincides with that credible interval.
From my reading, I thought this statement is true asymptotically, i.e. for large sample size, and if one uses an uninformative prior.
A simple numerical example would seem to confirm this - the 90% profile maximum likelihood intervals and 90% credible intervals of a ML binomial GLM and Bayesian binomial GLM are indeed virtually identical for n=1000
, though the discrepancy would become larger for small n
:
# simulate some data
set.seed(123)
n = 1000 # sample size
x1 = rnorm(n) # two continuous covariates
x2 = rnorm(n)
z = 0.1 + 2*x1 + 3*x2 # predicted values on logit scale
y = rbinom(n,1,plogis(z)) # bernoulli response variable
d = data.frame(y=y, x1=x1, x2=x2)
# fit a regular GLM and calculate 90% confidence intervals
glmfit = glm(y ~ x1 + x2, family = "binomial", data = d)
library(MASS)
# coefficients and 90% profile confidence intervals :
round(cbind(coef(glmfit), confint(glmfit, level=0.9)), 2)
# 5 % 95 %
# (Intercept) 0.00 -0.18 0.17
# x1 2.04 1.77 2.34
# x2 3.42 3.05 3.81
# fit a Bayesian GLM using rstanarm
library(rstanarm)
t_prior = student_t(df = 3, location = 0, scale = 100) # we set scale to large value to specify an uninformative prior
bfit1 = stan_glm(y ~ x1 + x2, data = d,
family = binomial(link = "logit"),
prior = t_prior, prior_intercept = t_prior,
chains = 1, cores = 4, seed = 123, iter = 10000)
# coefficients and 90% credible intervals :
round(cbind(coef(bfit1), posterior_interval(bfit1, prob = 0.9)), 2)
# 5% 95%
# (Intercept) -0.01 -0.18 0.17
# x1 2.06 1.79 2.37
# x2 3.45 3.07 3.85
# fit a Bayesian GLM using brms
library(brms)
priors = c(
prior(student_t(3, 0, 100), class = "Intercept"),
prior(student_t(3, 0, 100), class = "b")
)
bfit2 = brm(
y ~ x1 + x2,
data = d,
prior = priors,
family = "bernoulli",
seed = 123
)
# coefficients and 90% credible intervals :
summary(bfit2, prob=0.9)
# Population-Level Effects:
# Estimate Est.Error l-90% CI u-90% CI Eff.Sample Rhat
# Intercept -0.01 0.11 -0.18 0.18 2595 1.00
# x1 2.06 0.17 1.79 2.35 2492 1.00
# x2 3.45 0.23 3.07 3.83 2594 1.00
# fit a Bayesian GLM using arm
library(arm)
# we set prior.scale to Inf to specify an uninformative prior
bfit3 = bayesglm(y ~ x1 + x2, family = "binomial", data = d, prior.scale = Inf)
sims = coef(sim(bfit3, n.sims=1000000))
# coefficients and 90% credible intervals :
round(cbind(coef(bfit3), t(apply(sims, 2, function (col) quantile(col,c(.05, .95))))),2)
# 5% 95%
# (Intercept) 0.00 -0.18 0.17
# x1 2.04 1.76 2.33
# x2 3.42 3.03 3.80
As you can see, in the example above, for n=1000
, the 90% profile confidence intervals of a binomial GLM are virtually identical to the 90% credible intervals of a Bayesian binomial GLM (the difference is also within the bounds of using different seeds and different nrs of iterations in the bayesian fits, and an exact equivalence can also not be obtained since specifying a 100% uninformative prior is also not possible with rstanarm
or brms
).