简短的答案是,当且仅当数据中的类内相关性为正时,您的猜想才为真。从经验上讲,大多数时候大多数聚类数据集都显示出正的类内相关性,这意味着实际上您的猜想通常是正确的。但是,如果类内相关为0,则您提到的两种情况同样有用。而且,如果类内相关性为负,那么对更多主题进行更少的测量实际上就没有什么意义。实际上,我们宁愿(就减少参数估计的方差而言)对单个主题进行所有测量。
从统计学上讲,我们可以从两个角度考虑这一问题:您在问题中提到的随机效应(或混合)模型,或边际模型,最终在这里提供了更多信息。
随机效应(混合)模型
假设我们有一组主题,每个主题我们都进行了m次测量。那么第i个对象的第j次测量的简单随机效应模型可能是
y i j = β + u i + e i j,
其中β是固定截距,u i是随机对象效应(方差σ 2 ü),ê 我Ĵ是观察级别误差项(具有方差σ 2 ënmji
yij=β+ui+eij,
βuiσ2ueijσ2e),而后两个随机项是独立的。
在此模型中,表示总体平均值,并且在平衡的数据集(即,每个受试者的测量数量相等)的情况下,我们的最佳估计值就是样本平均值。因此,如果我们采用“更多信息”来表示此估计的较小方差,则基本上我们想知道样本均值的方差如何取决于n和m。有了一点代数,我们就可以算出
var (1βnm
var(1nm∑i∑jyij)=var(1nm∑i∑jβ+ui+eij)=1n2m2var(∑i∑jui+∑i∑jeij)=1n2m2(m2∑ivar(ui)+∑i∑jvar(eij))=1n2m2(nm2σ2u+nmσ2e)=σ2un+σ2enm.
Examining this expression, we can see that
whenever there is any subject variance (i.e.,
σ2u>0), increasing the number of subjects (
n) will make both of these terms smaller, while increasing the number of measurements per subject (
m) will only make the second term smaller. (For a practical implication of this for designing multi-site replication projects, see
this blog post I wrote a while ago.)
Now you wanted to know what happens when we increase or decrease m or n while holding constant the total number of observations. So for that we consider nm to be a constant, so that the whole variance expression just looks like
σ2un+constant,
which is as small as possible when
n is as large as possible (up to a maximum of
n=nm, in which case
m=1, meaning we take a single measurement from each subject).
My short answer referred to the intra-class correlation, so where does that fit in? In this simple random-effects model the intra-class correlation is
ρ=σ2uσ2u+σ2e
(sketch of a derivation
here). So we can write the variance equation above as
var(1nm∑i∑jyij)=σ2un+σ2enm=(ρn+1−ρnm)(σ2u+σ2e)
This doesn't really add any insight to what we already saw above, but it does make us wonder: since the intra-class correlation is a bona fide correlation coefficient, and correlation coefficients can be negative, what would happen (and what would it mean) if the intra-class correlation were negative?
In the context of the random-effects model, a negative intra-class correlation doesn't really make sense, because it implies that the subject variance σ2u is somehow negative (as we can see from the ρ equation above, and as explained here and here)... but variances can't be negative! But this doesn't mean that the concept of a negative intra-class correlation doesn't make sense; it just means that the random-effects model doesn't have any way to express this concept, which is a failure of the model, not of the concept. To express this concept adequately we need to consider the marginal model.
Marginal model
For this same dataset we could consider a so-called marginal model of yij,
yij=β+e∗ij,
where basically we've pushed the random subject effect
ui from before into the error term
eij so that we have
e∗ij=ui+eij. In the random-effects model we considered the two random terms
ui and
eij to be
i.i.d., but in the marginal model we instead consider
e∗ij to follow a block-diagonal covariance matrix
C like
C=σ2⎡⎣⎢⎢⎢⎢⎢R0⋮00R⋮0⋯⋯⋱⋯00⋮R⎤⎦⎥⎥⎥⎥⎥,R=⎡⎣⎢⎢⎢⎢⎢1ρ⋮ρρ1⋮ρ⋯⋯⋱⋯ρρ⋮1⎤⎦⎥⎥⎥⎥⎥
In words, this means that under the marginal model we simply consider
ρ to be the expected correlation between two
e∗s from the same subject (we assume the correlation across subjects is 0). When
ρ is positive, two observations drawn from the same subject tend to be more similar (closer together), on average, than two observations drawn randomly from the dataset while ignoring the clustering due to subjects. When
ρ is
negative, two observations drawn from the same subject tend to be
less similar (further apart), on average, than two observations drawn completely at random. (More information about this interpretation in
the question/answers here.)
So now when we look at the equation for the variance of the sample mean under the marginal model, we have
var(1nm∑i∑jyij)=var(1nm∑i∑jβ+e∗ij)=1n2m2var(∑i∑je∗ij)=1n2m2(n(mσ2+(m2−m)ρσ2))=σ2(1+(m−1)ρ)nm=(ρn+1−ρnm)σ2,
which is the same variance expression we derived above for the random-effects model, just with
σ2e+σ2u=σ2, which is consistent with our note above that
e∗ij=ui+eij. The advantage of this (statistically equivalent) perspective is that here we can think about a negative intra-class correlation without needing to invoke any weird concepts like a negative subject variance. Negative intra-class correlations just fit naturally in this framework.
(BTW, just a quick aside to point out that the second-to-last line of the derivation above implies that we must have ρ≥−1/(m−1), or else the whole equation is negative, but variances can't be negative! So there is a lower bound on the intra-class correlation that depends on how many measurements we have per cluster. For m=2 (i.e., we measure each subject twice), the intra-class correlation can go all the way down to ρ=−1; for m=3 it can only go down to ρ=−1/2; and so on. Fun fact!)
So finally, once again considering the total number of observations nm to be a constant, we see that the second-to-last line of the derivation above just looks like
(1+(m−1)ρ)×positive constant.
So when
ρ>0, having
m as small as possible (so that we take fewer measurements of more subjects--in the limit, 1 measurement of each subject) makes the variance of the estimate as small as possible. But when
ρ<0, we actually want
m to be as
large as possible (so that, in the limit, we take all
nm measurements from a single subject) in order to make the variance as small as possible. And when
ρ=0, the variance of the estimate is just a constant, so our allocation of
m and
n doesn't matter.