



显然这是对的,但是我想知道如何用数学方式证明这一点……我认为可以使用线性混合模型。但是,我对用于估算它们的数学知识不甚了解(我只lmer4为LMM和bmrsGLMM 运行:)您能给我展示一个真实的例子吗?与R中的某些代码相比,我更希望提供一些公式的答案。请随意假设一个简单的设置,例如具有正态分布的随机截距和斜率的线性混合模型。



在使获取的数据点的样本均值的方差最小的意义上,“最佳” 。N



也许类似的东西:样品的方差每个受试者手段应该是,其中第一项是所述受试者方差,第二个是每个受试者的平均的估计的方差。然后过的受试者的平均方差(即总平均值)将σ 2 一个 + σ 2 / Ñ /= σ 2 /+ σ 2 /Ñ = σ 2 /σ一种2+σ2/ñ当其被最小化= Ñ






假设我们有一组主题,每个主题我们都进行了m次测量。那么第i个对象的第j次测量的简单随机效应模型可能是 y i j = β + u i + e i j 其中β是固定截距,u i是随机对象效应(方差σ 2 ü),ê Ĵ是观察级别误差项(具有方差σ 2 ënmji


在此模型中,表示总体平均值,并且在平衡的数据集(即,每个受试者的测量数量相等)的情况下,我们的最佳估计值就是样本平均值。因此,如果我们采用“更多信息”来表示此估计的较小方差,则基本上我们想知道样本均值的方差如何取决于nm。有了一点代数,我们就可以算出 var 1βnm

Examining this expression, we can see that whenever there is any subject variance (i.e., σu2>0), increasing the number of subjects (n) will make both of these terms smaller, while increasing the number of measurements per subject (m) will only make the second term smaller. (For a practical implication of this for designing multi-site replication projects, see this blog post I wrote a while ago.)

Now you wanted to know what happens when we increase or decrease m or n while holding constant the total number of observations. So for that we consider nm to be a constant, so that the whole variance expression just looks like

which is as small as possible when n is as large as possible (up to a maximum of n=nm, in which case m=1, meaning we take a single measurement from each subject).

My short answer referred to the intra-class correlation, so where does that fit in? In this simple random-effects model the intra-class correlation is

(sketch of a derivation here). So we can write the variance equation above as
This doesn't really add any insight to what we already saw above, but it does make us wonder: since the intra-class correlation is a bona fide correlation coefficient, and correlation coefficients can be negative, what would happen (and what would it mean) if the intra-class correlation were negative?

In the context of the random-effects model, a negative intra-class correlation doesn't really make sense, because it implies that the subject variance σu2 is somehow negative (as we can see from the ρ equation above, and as explained here and here)... but variances can't be negative! But this doesn't mean that the concept of a negative intra-class correlation doesn't make sense; it just means that the random-effects model doesn't have any way to express this concept, which is a failure of the model, not of the concept. To express this concept adequately we need to consider the marginal model.

Marginal model

For this same dataset we could consider a so-called marginal model of yij,

where basically we've pushed the random subject effect ui from before into the error term eij so that we have eij=ui+eij. In the random-effects model we considered the two random terms ui and eij to be i.i.d., but in the marginal model we instead consider eij to follow a block-diagonal covariance matrix C like
In words, this means that under the marginal model we simply consider ρ to be the expected correlation between two es from the same subject (we assume the correlation across subjects is 0). When ρ is positive, two observations drawn from the same subject tend to be more similar (closer together), on average, than two observations drawn randomly from the dataset while ignoring the clustering due to subjects. When ρ is negative, two observations drawn from the same subject tend to be less similar (further apart), on average, than two observations drawn completely at random. (More information about this interpretation in the question/answers here.)

So now when we look at the equation for the variance of the sample mean under the marginal model, we have

which is the same variance expression we derived above for the random-effects model, just with σe2+σu2=σ2, which is consistent with our note above that eij=ui+eij. The advantage of this (statistically equivalent) perspective is that here we can think about a negative intra-class correlation without needing to invoke any weird concepts like a negative subject variance. Negative intra-class correlations just fit naturally in this framework.

(BTW, just a quick aside to point out that the second-to-last line of the derivation above implies that we must have ρ1/(m1), or else the whole equation is negative, but variances can't be negative! So there is a lower bound on the intra-class correlation that depends on how many measurements we have per cluster. For m=2 (i.e., we measure each subject twice), the intra-class correlation can go all the way down to ρ=1; for m=3 it can only go down to ρ=1/2; and so on. Fun fact!)

So finally, once again considering the total number of observations nm to be a constant, we see that the second-to-last line of the derivation above just looks like

(1+(m1)ρ)×positive constant.
So when ρ>0, having m as small as possible (so that we take fewer measurements of more subjects--in the limit, 1 measurement of each subject) makes the variance of the estimate as small as possible. But when ρ<0, we actually want m to be as large as possible (so that, in the limit, we take all nm measurements from a single subject) in order to make the variance as small as possible. And when ρ=0, the variance of the estimate is just a constant, so our allocation of m and n doesn't matter.

+1. Great answer. I have to admit that the second part, about ρ<0, is quite unintuitive: even with a huge (or infinite) total number nm of observations the best we can do is to allocate all observations to one single subject, meaning that the standard error of the mean will be σu and it's not possible in principle to reduce it any further. This is just so weird! True β remains unknowable, whatever resources one puts into measuring it. Is this interpretation correct?
amoeba says Reinstate Monica

Ah, no. The above is not correct because as m increases to infinity, ρ cannot stay negative and has to approach zero (corresponding to zero subject variance). Hmm. This negative correlation is a funny thing: it's not really a parameter of the generative model because it's constrained by the sample size (whereas one would normally expect a generative model to be able to generate any number of observations, whatever the parameters are). I am not quite sure what is the proper way to think about it.
amoeba says Reinstate Monica

@DeltaIV What is "the covariance matrix of the random effects" in this case? In the mixed model written by Jake above, there is only one random effect and so there is no "covariance matrix" really, but just one number: σu2. What Σ are you referring to?
amoeba says Reinstate Monica

@DeltaIV Well, the general principle is en.wikipedia.org/wiki/Inverse-variance_weighting, and the variance of each subject's sample mean is given by σu2+σe2/mi (that's why Jake wrote above that the weights have to depend on the estimate of between-subject variance). The estimate of within-subject variance is given by the variance of the pooled within-subject deviations, the estimate of between-subject variance is the variance of subjects' means, and using all that one can compute the weights. (But I am not sure if this is 100% equivalent to what lmer will do.)
amoeba says Reinstate Monica

Jake, yes, it's exactly this hard-coding of m that was bothering me. If this is "sample size" then it cannot be a parameter of the underlying system. My current thinking is that negative ρ should actually indicate that there is another within-subject factor that is ignored/unknown to us. E.g. it could be pre & post of some intervention and the difference between them is so large that the measurements are negatively correlated. But this would mean that m is not really a sample size, but the number of levels of this unknown factor, and that can certainly be hard coded...
amoeba says Reinstate Monica
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.