我对美国各县进行了回归分析,并正在检查“独立”变量中的共线性。Belsley,Kuh和Welsch的回归诊断建议考虑条件指数和方差分解比例:
library(perturb)
## colldiag(, scale=TRUE) for model with interaction
Condition
Index Variance Decomposition Proportions
(Intercept) inc09_10k unins09 sqmi_log pop10_perSqmi_log phys_per100k nppa_per100k black10_pct hisp10_pct elderly09_pct inc09_10k:unins09
1 1.000 0.000 0.000 0.000 0.000 0.001 0.002 0.003 0.002 0.002 0.001 0.000
2 3.130 0.000 0.000 0.000 0.000 0.002 0.053 0.011 0.148 0.231 0.000 0.000
3 3.305 0.000 0.000 0.000 0.000 0.000 0.095 0.072 0.351 0.003 0.000 0.000
4 3.839 0.000 0.000 0.000 0.001 0.000 0.143 0.002 0.105 0.280 0.009 0.000
5 5.547 0.000 0.002 0.000 0.000 0.050 0.093 0.592 0.084 0.005 0.002 0.000
6 7.981 0.000 0.005 0.006 0.001 0.150 0.560 0.256 0.002 0.040 0.026 0.001
7 11.170 0.000 0.009 0.003 0.000 0.046 0.000 0.018 0.003 0.250 0.272 0.035
8 12.766 0.000 0.050 0.029 0.015 0.309 0.023 0.043 0.220 0.094 0.005 0.002
9 18.800 0.009 0.017 0.003 0.209 0.001 0.002 0.001 0.047 0.006 0.430 0.041
10 40.827 0.134 0.159 0.163 0.555 0.283 0.015 0.001 0.035 0.008 0.186 0.238
11 76.709 0.855 0.759 0.796 0.219 0.157 0.013 0.002 0.004 0.080 0.069 0.683
## colldiag(, scale=TRUE) for model without interaction
Condition
Index Variance Decomposition Proportions
(Intercept) inc09_10k unins09 sqmi_log pop10_perSqmi_log phys_per100k nppa_per100k black10_pct hisp10_pct elderly09_pct
1 1.000 0.000 0.001 0.001 0.000 0.001 0.003 0.004 0.003 0.003 0.001
2 2.988 0.000 0.000 0.001 0.000 0.002 0.030 0.003 0.216 0.253 0.000
3 3.128 0.000 0.000 0.002 0.000 0.000 0.112 0.076 0.294 0.027 0.000
4 3.630 0.000 0.002 0.001 0.001 0.000 0.160 0.003 0.105 0.248 0.009
5 5.234 0.000 0.008 0.002 0.000 0.053 0.087 0.594 0.086 0.004 0.001
6 7.556 0.000 0.024 0.039 0.001 0.143 0.557 0.275 0.002 0.025 0.035
7 11.898 0.000 0.278 0.080 0.017 0.371 0.026 0.023 0.147 0.005 0.038
8 13.242 0.000 0.001 0.343 0.006 0.000 0.000 0.017 0.129 0.328 0.553
9 21.558 0.010 0.540 0.332 0.355 0.037 0.000 0.003 0.003 0.020 0.083
10 50.506 0.989 0.148 0.199 0.620 0.393 0.026 0.004 0.016 0.087 0.279
?HH::vif
表明VIF> 5是有问题的:
library(HH)
## vif() for model with interaction
inc09_10k unins09 sqmi_log pop10_perSqmi_log phys_per100k nppa_per100k black10_pct hisp10_pct
8.378646 16.329881 1.653584 2.744314 1.885095 1.471123 1.436229 1.789454
elderly09_pct inc09_10k:unins09
1.547234 11.590162
## vif() for model without interaction
inc09_10k unins09 sqmi_log pop10_perSqmi_log phys_per100k nppa_per100k black10_pct hisp10_pct
1.859426 2.378138 1.628817 2.716702 1.882828 1.471102 1.404482 1.772352
elderly09_pct
1.545867
而John Fox的回归诊断建议查看VIF的平方根:
library(car)
## sqrt(vif) for model with interaction
inc09_10k unins09 sqmi_log pop10_perSqmi_log phys_per100k nppa_per100k black10_pct hisp10_pct
2.894589 4.041025 1.285917 1.656597 1.372987 1.212898 1.198428 1.337705
elderly09_pct inc09_10k:unins09
1.243879 3.404433
## sqrt(vif) for model without interaction
inc09_10k unins09 sqmi_log pop10_perSqmi_log phys_per100k nppa_per100k black10_pct hisp10_pct
1.363608 1.542121 1.276251 1.648242 1.372162 1.212890 1.185108 1.331297
elderly09_pct
1.243329
在前两种情况下(建议有明确的界限),仅当包含交互项时,该模型才有问题。
到目前为止,带有交互项的模型一直是我的首选规范。
鉴于这些数据怪异,我有两个问题:
- 交互项是否总是会使数据的共线性恶化?
- 由于没有交互项的两个变量均不超过阈值,因此可以将模型与交互项一起使用。具体来说,我认为这样做没问题的原因是,我使用的是King,Tomz和Wittenberg(2000)方法来解释系数(负二项式模型),我通常将其他系数取平均值,然后解释当我移动会发生什么我因变量的预测
inc09_10k
和unins09
周围的独立和联合。