表明对5个主题进行100次测量比对100个主题进行5次测量提供的信息少得多


21

在一次会议上,我无意中听到以下声明:

5个主题的100次测量所提供的信息比100个主题的5次测量要少得多。

显然这是对的,但是我想知道如何用数学方式证明这一点……我认为可以使用线性混合模型。但是,我对用于估算它们的数学知识不甚了解(我只lmer4为LMM和bmrsGLMM 运行:)您能给我展示一个真实的例子吗?与R中的某些代码相比,我更希望提供一些公式的答案。请随意假设一个简单的设置,例如具有正态分布的随机截距和斜率的线性混合模型。

PS不涉及LMM的基于数学的答案也是可以的。我之所以想到LMM,是因为它们在我看来是一种自然的工具,可以解释为什么来自更多学科的较少量度要比来自少数学科的更多量度更好,但是我很可能错了。


3
+1。我想最简单的设定是考虑估计总体均值的任务μ,其中每个受试者都有自己的平均añμσ一种2和该对象的每个测量分布为Xñ一种σ2。如果我们从对象中的每一个进行ñ测量,那么给定常数乘积,设置和的最佳方法是什么。ñnm=N
变形虫说恢复莫妮卡

在使获取的数据点的样本均值的方差最小的意义上,“最佳” 。N
变形虫说莫妮卡(Reonica)

1
是。但是对于您的问题,我们无需关心如何估算方差;你的问题(即在你的问题的引号)是我认为只有大约估计全球平均μ,它似乎很明显,最好的估计是由盛大平均值决定x¯所有的N=nm样品中的点。接下来的问题是:给定μσ2σa2nm,是什么的方差x¯?如果我们知道,我们将能够将其与尊重尽量减少n给出的nm=N约束。
变形虫说莫妮卡(Reonica)

1
我不知道如何得出任何结论,但我同意这似乎很明显:为了估计误差方差,最好是对一个受试者的所有N测量都进行评估。并估计受试者差异(最好?)最好是让N不同的受试者各进行1次测量。但是,关于均值还不是很清楚,但是我的直觉告诉我,让N受试者各进行1次测量也是最好的。我想知道这是否是真的...
变形虫说莫妮卡(Monica)恢复

2
也许类似的东西:样品的方差每个受试者手段应该是,其中第一项是所述受试者方差,第二个是每个受试者的平均的估计的方差。然后过的受试者的平均方差(即总平均值)将σ 2 一个 + σ 2 / Ñ /= σ 2 /+ σ 2 /Ñ = σ 2 /σ一种2+σ2/ñ当其被最小化= Ñ
(σa2+σ2/n)/m=σa2/m+σ2/(nm)=σa2/m+σ2/N=σa2/m+const,
m=N
变形虫说莫妮卡(Reonica)

Answers:


25

简短的答案是,当且仅当数据中的类内相关性为正时您的猜想才为真。从经验上讲,大多数时候大多数聚类数据集都显示出正的类内相关性,这意味着实际上您的猜想通常是正确的。但是,如果类内相关为0,则您提到的两种情况同样有用。而且,如果类内相关性为,那么对更多主题进行更少的测量实际上就没有什么意义。实际上,我们宁愿(就减少参数估计的方差而言)对单个主题进行所有测量。

从统计学上讲,我们可以从两个角度考虑这一问题:您在问题中提到的随机效应(或混合模型,或边际模型,最终在这里提供了更多信息。

随机效应(混合)模型

假设我们有一组主题,每个主题我们都进行了m次测量。那么第i个对象的第j次测量的简单随机效应模型可能是 y i j = β + u i + e i j 其中β是固定截距,u i是随机对象效应(方差σ 2 ü),ê Ĵ是观察级别误差项(具有方差σ 2 ënmji

yij=β+ui+eij,
βuiσu2eijσe2),而后两个随机项是独立的。

在此模型中,表示总体平均值,并且在平衡的数据集(即,每个受试者的测量数量相等)的情况下,我们的最佳估计值就是样本平均值。因此,如果我们采用“更多信息”来表示此估计的较小方差,则基本上我们想知道样本均值的方差如何取决于nm。有了一点代数,我们就可以算出 var 1βnm

var(1nmijyij)=var(1nmijβ+ui+eij)=1n2m2var(ijui+ijeij)=1n2m2(m2ivar(ui)+ijvar(eij))=1n2m2(nm2σu2+nmσe2)=σu2n+σe2nm.
Examining this expression, we can see that whenever there is any subject variance (i.e., σu2>0), increasing the number of subjects (n) will make both of these terms smaller, while increasing the number of measurements per subject (m) will only make the second term smaller. (For a practical implication of this for designing multi-site replication projects, see this blog post I wrote a while ago.)

Now you wanted to know what happens when we increase or decrease m or n while holding constant the total number of observations. So for that we consider nm to be a constant, so that the whole variance expression just looks like

σu2n+constant,
which is as small as possible when n is as large as possible (up to a maximum of n=nm, in which case m=1, meaning we take a single measurement from each subject).

My short answer referred to the intra-class correlation, so where does that fit in? In this simple random-effects model the intra-class correlation is

ρ=σu2σu2+σe2
(sketch of a derivation here). So we can write the variance equation above as
var(1nmijyij)=σu2n+σe2nm=(ρn+1ρnm)(σu2+σe2)
This doesn't really add any insight to what we already saw above, but it does make us wonder: since the intra-class correlation is a bona fide correlation coefficient, and correlation coefficients can be negative, what would happen (and what would it mean) if the intra-class correlation were negative?

In the context of the random-effects model, a negative intra-class correlation doesn't really make sense, because it implies that the subject variance σu2 is somehow negative (as we can see from the ρ equation above, and as explained here and here)... but variances can't be negative! But this doesn't mean that the concept of a negative intra-class correlation doesn't make sense; it just means that the random-effects model doesn't have any way to express this concept, which is a failure of the model, not of the concept. To express this concept adequately we need to consider the marginal model.

Marginal model

For this same dataset we could consider a so-called marginal model of yij,

yij=β+eij,
where basically we've pushed the random subject effect ui from before into the error term eij so that we have eij=ui+eij. In the random-effects model we considered the two random terms ui and eij to be i.i.d., but in the marginal model we instead consider eij to follow a block-diagonal covariance matrix C like
C=σ2[R000R000R],R=[1ρρρ1ρρρ1]
In words, this means that under the marginal model we simply consider ρ to be the expected correlation between two es from the same subject (we assume the correlation across subjects is 0). When ρ is positive, two observations drawn from the same subject tend to be more similar (closer together), on average, than two observations drawn randomly from the dataset while ignoring the clustering due to subjects. When ρ is negative, two observations drawn from the same subject tend to be less similar (further apart), on average, than two observations drawn completely at random. (More information about this interpretation in the question/answers here.)

So now when we look at the equation for the variance of the sample mean under the marginal model, we have

var(1nmijyij)=var(1nmijβ+eij)=1n2m2var(ijeij)=1n2m2(n(mσ2+(m2m)ρσ2))=σ2(1+(m1)ρ)nm=(ρn+1ρnm)σ2,
which is the same variance expression we derived above for the random-effects model, just with σe2+σu2=σ2, which is consistent with our note above that eij=ui+eij. The advantage of this (statistically equivalent) perspective is that here we can think about a negative intra-class correlation without needing to invoke any weird concepts like a negative subject variance. Negative intra-class correlations just fit naturally in this framework.

(BTW, just a quick aside to point out that the second-to-last line of the derivation above implies that we must have ρ1/(m1), or else the whole equation is negative, but variances can't be negative! So there is a lower bound on the intra-class correlation that depends on how many measurements we have per cluster. For m=2 (i.e., we measure each subject twice), the intra-class correlation can go all the way down to ρ=1; for m=3 it can only go down to ρ=1/2; and so on. Fun fact!)

So finally, once again considering the total number of observations nm to be a constant, we see that the second-to-last line of the derivation above just looks like

(1+(m1)ρ)×positive constant.
So when ρ>0, having m as small as possible (so that we take fewer measurements of more subjects--in the limit, 1 measurement of each subject) makes the variance of the estimate as small as possible. But when ρ<0, we actually want m to be as large as possible (so that, in the limit, we take all nm measurements from a single subject) in order to make the variance as small as possible. And when ρ=0, the variance of the estimate is just a constant, so our allocation of m and n doesn't matter.

3
+1. Great answer. I have to admit that the second part, about ρ<0, is quite unintuitive: even with a huge (or infinite) total number nm of observations the best we can do is to allocate all observations to one single subject, meaning that the standard error of the mean will be σu and it's not possible in principle to reduce it any further. This is just so weird! True β remains unknowable, whatever resources one puts into measuring it. Is this interpretation correct?
amoeba says Reinstate Monica

3
Ah, no. The above is not correct because as m increases to infinity, ρ cannot stay negative and has to approach zero (corresponding to zero subject variance). Hmm. This negative correlation is a funny thing: it's not really a parameter of the generative model because it's constrained by the sample size (whereas one would normally expect a generative model to be able to generate any number of observations, whatever the parameters are). I am not quite sure what is the proper way to think about it.
amoeba says Reinstate Monica

1
@DeltaIV What is "the covariance matrix of the random effects" in this case? In the mixed model written by Jake above, there is only one random effect and so there is no "covariance matrix" really, but just one number: σu2. What Σ are you referring to?
amoeba says Reinstate Monica

2
@DeltaIV Well, the general principle is en.wikipedia.org/wiki/Inverse-variance_weighting, and the variance of each subject's sample mean is given by σu2+σe2/mi (that's why Jake wrote above that the weights have to depend on the estimate of between-subject variance). The estimate of within-subject variance is given by the variance of the pooled within-subject deviations, the estimate of between-subject variance is the variance of subjects' means, and using all that one can compute the weights. (But I am not sure if this is 100% equivalent to what lmer will do.)
amoeba says Reinstate Monica

1
Jake, yes, it's exactly this hard-coding of m that was bothering me. If this is "sample size" then it cannot be a parameter of the underlying system. My current thinking is that negative ρ should actually indicate that there is another within-subject factor that is ignored/unknown to us. E.g. it could be pre & post of some intervention and the difference between them is so large that the measurements are negatively correlated. But this would mean that m is not really a sample size, but the number of levels of this unknown factor, and that can certainly be hard coded...
amoeba says Reinstate Monica
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.