仅观察一次的随机效应将如何影响广义线性混合模型?


14

我有一个数据集,在该数据集中,我想用作随机效果的变量在某些级别上只有一个观察值。基于对先前问题的回答,我认为原则上可以。

我可以将混合模型与只有1个观察值的对象拟合吗?

随机截距模型-每个科目一次测量

但是,在第二个链接中,第一个答案指出:

“ ...假设您没有使用广义线性混合模型GLMM,在这种情况下,过度分散的问题将发挥作用”

我正在考虑使用GLMM,但我真的不了解单次观察的随机效应水平将如何影响模型。


这是我要拟合的模型之一的示例。我正在研究鸟类,我想模拟人口和季节对迁徙期间停留次数的影响。我想将个人用作随机效应,因为对于某些个人,我拥有长达5年的数据。

library(dplyr)
library(lme4)
pop <- as.character(c("BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "NU", "NU", "NU", "NU", "NU", "NU", "NU", "NU", "NU", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA"))
id <- "2 2 4 4 7 7 9 9 10 10 84367 84367 84367 84368 84368 84368 84368 84368 84368 84369 84369 33073 33073 33073 33073 33073 33073 33073 33073 33073 80149 80149 80149 80150 80150 80150 57140 57141 126674 126677 126678 126680 137152 137152 137157 115925 115925 115925 115925 115925 115925 115925 115925 115926 115926 115926 115926 115926 115926 115927 115928 115929 115929 115929 115930 115930 115930 115930 115931 115931 115931 115932 115932 115932"
id <- strsplit(id, " ")
id <- as.numeric(unlist(id))
year <- "2014 2015 2014 2015 2014 2015 2014 2015 2014 2015 2009 2010 2010 2009 2010 2010 2011 2011 2012 2009 2010 2009 2009 2010 2010 2011 2011 2012 2012 2013 2008 2008 2009 2008 2008 2009 2008 2008 2013 2013 2013 2013 2014 2015 2014 2012 2013 2013 2014 2014 2015 2015 2016 2012 2013 2013 2014 2014 2015 2013 2012 2012 2013 2013 2012 2013 2013 2014 2013 2014 2014 2013 2014 2014"
year <- strsplit(year, " ")
year <- as.numeric(unlist(year))
season <- as.character(c("fall", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "fall", "fall", "spring", "fall", "fall", "spring", "fall", "spring", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "spring", "fall", "spring", "spring", "fall", "spring", "spring", "fall", "fall", "fall", "fall", "fall", "fall", "fall", "spring", "fall", "fall", "fall", "spring", "fall", "spring", "fall", "spring", "spring", "fall", "fall", "spring", "fall", "spring", "spring", "fall", "fall", "fall", "fall", "spring", "fall", "fall", "spring", "spring","fall", "fall", "spring", "fall", "fall", "spring"))
stops <- "0 0 0 0 0 0 1 0 2 1 1 0 0 3 2 0 1 1 0 1 1 2 0 1 0 2 0 4 0 0 2 1 1 2 5 2 1 0 9 6 2 3 4 7 2 0 0 0 0 0 2 0 0 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0"
stops <- strsplit(stops, " ")
stops <- as.numeric(unlist(stops))

stopdata <- data.frame(pop = pop, id = id, year = year, season = season, stops = stops, stringsAsFactors = FALSE)


stopdata <- group_by(stopdata, pop, id)
summary1 <- summarise(stopdata, n.years = length(year))
table(summary1$n.years)

有27个人。9个人有一个观察结果。18个人有2-9个观察结果。

如果1/3的随机效应水平只有一次观察,该怎么办?


我一直在考虑:

选项1:如上所述的GLMM

stops.glmm <- glmer(stops ~ pop + season + (1|id), data=stopdata, family = poisson)

选项2:使用加权平均线性模型GLM,该方法用于具有多个观测值的个体

aggfun <- function(data, idvars=c("pop", "season", "id"), response){
#select id variables, response variable, and year
sub1 <- na.omit(data[,c(idvars, "year", response)])
#aggregate for mean response by year
agg1 <- aggregate(sub1[names(sub1) == response],by=sub1[idvars],FUN=mean)
#sample size for each aggregated group
aggn <- aggregate(sub1[response],by=sub1[idvars],FUN=length)
#rename sample size column
names(aggn)[4] <- "n"
agg2 <- merge(agg1, aggn)
agg2}


#Create weighted dataset
stops.weight <- aggfun(data = stopdata, response = "stops")
stops.weight$stops <- round(stops.weight$stops)

#Weighted GLM
stops.glm <- glm(stops~pop + season, data=stops.weight, family = poisson, weights = n)

报价来自哪里?我找不到对应的答案。
变形虫说莫妮卡恢复

第二个链接,第一个答案,用括号括起来
canderson156 '16

3
简短的非相当答案:我认为不会有任何问题。不确切知道上面链接的第二个问题的第一个答复者是什么意思:您是否考虑过在此处发表评论(如果您有足够的代表)?在每组只有1个观察值的极限中,组间差异和残差变异将被完全混淆。如果您的观察小组中有> 1 个的少数小组(而这些小组中的少数小组),那么我可能不会打扰到混合模型,但是您的情况听起来不错……
Ben Bolker

我不确定您的第二个选项(加权Poisson)是否真的能正常工作,但是我必须仔细考虑一下。
Ben Bolker

@BenBolker在您描述的情况下,如果只有少数观察> 1的群体,您会选择做什么?
mkt-恢复莫妮卡

Answers:


3

通常,您在可识别性方面存在问题。仅通过一次测量就将随机效应分配给参数的线性模型无法区分随机效应和残差。

典型的线性混合效应方程将如下所示:

E=β+ηi+ϵj

βηiiϵjjηϵηϵSD(η)SD(ϵ)var(η)+var(ϵ)

SD(η)SD(ϵ)

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.