其方差膨胀因子I应该是使用:


30

我正在尝试使用vifR包中的函数解释方差膨胀因子car。该函数既打印广义并且还GVIF 1 /2 DF 。根据帮助文件,这后一个值VIFGVIF1/(2df)

为了调整置信椭圆的尺寸,该函数还会打印GVIF ^ [1 /(2 * df)],其中df是与该项相关的自由度。

我不明白这个解释在帮助文件的意思,所以我不知道我是否应该使用GVIF 1 /2 DF 。对于我的模型这两个值有很大的不同(最大GVIF为〜60 ;最大GVIF 1 /2 DF 为〜3)。GVIFGVIF1/(2df)GVIF60GVIF1/(2df)3

有人可以向我解释我应该使用哪一个,调整置信椭球的尺寸意味着什么?

Answers:


25

Georges Monette and I introduced the GVIF in the paper "Generalized collinearity diagnostics," JASA 87:178-183, 1992 (link). As we explained, the GVIF represents the squared ratio of hypervolumes of the joint-confidence ellipsoid for a subset of coefficients to the "utopian" ellipsoid that would be obtained if the regressors in this subset were uncorrelated with regressors in the complementary subset. In the case of a single coefficient, this specializes to the usual VIF. To make GVIFs comparable across dimensions, we suggested using GVIF^(1/(2*Df)), where Df is the number of coefficients in the subset. In effect, this reduces the GVIF to a linear measure, and for the VIF, where Df = 1, is proportional to the inflation due to collinearity in the confidence interval for the coefficient.


3
Welcome to our site! We would be honored if you would register your account and come visit once in a while. One small housekeeping note: You don't have to sign your posts, your identicon, with a link to your userpage, is automatically added to every answer you give.
gung - Reinstate Monica

24

I ran into exactly the same question and tried to work my way through. See my detailed answer below.

First of all, I found 4 options producing similar VIF values in R:

corvif command from the AED package,

vif command from the car package,

vif command from the rms package,

vif command from the DAAG package.

Using these commands on a set of predictors not including any factors / categorical variables or polynomial terms is strait forward. All three commands produce the same numerical output even though the corvif command from the AED package labels the results as GVIF.

However, typically, GVIF only comes into play for factors and polynomial variables. Variables which require more than 1 coefficient and thus more than 1 degree of freedom are typically evaluated using the GVIF. For one-coefficient terms VIF equals GVIF.

Thus, you may apply standard rules of thumb on whether collinearity may be a problem, such as a 3, 5 or 10 threshold. However, some caution could (should) be applied (see: http://www.nkd-group.com/ghdash/mba555/PDF/VIF%20article.pdf).

In case of multi-coefficient terms, as for e.g. categorical predictors, the 4 packages produce different outputs. The vif commands from the rms and DAAG packages produce VIF values, whereas the other two produce GVIF values.

Let us have a look at VIF values from the rms and DAAG packages first:

TNAP     ICE     RegB    RegC    RegD    RegE

1.994    2.195   3.074   3.435   2.907   2.680

TNAP and ICE are continuous predictors and Reg is a categorical variable presented by the dummies RegB to RegE. In this case RegA is the baseline. All VIF values are rather moderate and usually nothing to worry about. The problem with this result is, that it is affected by the baseline of the categorical variable. In order to be sure of not having a VIF value above an acceptable level, it would be necessary to redo this analysis for every level of the categorical variable being the baseline. In this case five times.

Applying the corvif command from the AED package or vif command from the car package, GVIF values are produced:

     |  GVIF     | Df | GVIF^(1/2Df) |  

TNAP | 1.993964  | 1  | 1.412078     |
ICE  | 2.195035  | 1  | 1.481565     | 
Reg  | 55.511089 | 5  | 1.494301     |

The GVIF is calculated for sets of related regressors such as a for a set of dummy regressors. For the two continuous variables TNAP and ICE this is the same as the VIF values before. For the categorical variable Reg, we now get one very high GVIF value, even though the VIF values for the single levels of the categorical variable were all moderate (as shown above).

However, the interpretation is different. For the two continuous variables, GVIF(1/(2×Df)) (which is basically the square root of the VIF/GVIF value as DF = 1) is the proportional change of the standard error and confidence interval of their coefficients due to the level of collinearity. The GVIF(1/(2×Df)) value of the categorical variable is a similar measure for the reduction in precision of the coefficients' estimation due to collinearity (even though not ready for quoting also look at http://socserv2.socsci.mcmaster.ca/jfox/papers/linear-models-problems.pdf).

If we then simply apply the same standard rules of thumb for GVIF(1/(2×Df)) values as recommended in literature for the VIF, we simply need to square GVIF(1/(2×Df)).

Reading through all the forum posts, short notes in the web and scientific papers, it seems that there is quite some confusion going on. In peer reviewed papers, I found the values for GVIF(1/(2×Df)) ignored and the same standard rules suggested for the VIF are applied to the GVIF values. In another paper, GVIF values of close to 100 are excepted because of a reasonably small GVIF(1/(2×Df)) (due to a high DF). The rule of GVIF2(1/(2×Df))<2 is applied in some publications, which would equal a VIF of 4 for one-coefficient variables.


Welcome to the site, @JanPhilippS. This seems like as much a new question as an answer to the OP's question. Please only use the "Your Answer" field to provide answers. If you have your own question, click the [ASK QUESTION] at the top & ask it there, then we can help you properly. Since you are new here, you may want to take our tour, which contains information for new users.
gung - Reinstate Monica

2
Well, it's not really a new question. Rather a detailed answer.
Jan Philipp S

1
@JanPhilippS, thanks for the links to sources for further reading. I think your post seemed like a quality answer that allowed for some reflection on the state of affairs.
timothy.s.lau

6

Fox & Monette (original citation for GVIF, GVIF^1/2df) suggest taking GVIF to the power of 1/2df makes the value of the GVIF comparable across different number of parameters. "It is analagous to taking the square root of the usual variance-inflation factor" ( from An R and S-Plus Companion to Applied Regression by John Fox). So yes, squaring it and applying the usual VIF "rule of thumb" seems reasonable.

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.