Answers:
Georges Monette and I introduced the GVIF in the paper "Generalized collinearity diagnostics," JASA 87:178-183, 1992 (link). As we explained, the GVIF represents the squared ratio of hypervolumes of the joint-confidence ellipsoid for a subset of coefficients to the "utopian" ellipsoid that would be obtained if the regressors in this subset were uncorrelated with regressors in the complementary subset. In the case of a single coefficient, this specializes to the usual VIF. To make GVIFs comparable across dimensions, we suggested using GVIF^(1/(2*Df)), where Df is the number of coefficients in the subset. In effect, this reduces the GVIF to a linear measure, and for the VIF, where Df = 1, is proportional to the inflation due to collinearity in the confidence interval for the coefficient.
I ran into exactly the same question and tried to work my way through. See my detailed answer below.
First of all, I found 4 options producing similar VIF values in R:
• corvif
command from the AED package,
• vif
command from the car package,
• vif
command from the rms package,
• vif
command from the DAAG package.
Using these commands on a set of predictors not including any factors / categorical variables or polynomial terms is strait forward. All three commands produce the same numerical output even though the corvif
command from the AED package labels the results as GVIF.
However, typically, GVIF only comes into play for factors and polynomial variables. Variables which require more than 1 coefficient and thus more than 1 degree of freedom are typically evaluated using the GVIF. For one-coefficient terms VIF equals GVIF.
Thus, you may apply standard rules of thumb on whether collinearity may be a problem, such as a 3, 5 or 10 threshold. However, some caution could (should) be applied (see: http://www.nkd-group.com/ghdash/mba555/PDF/VIF%20article.pdf).
In case of multi-coefficient terms, as for e.g. categorical predictors, the 4 packages produce different outputs. The vif
commands from the rms and DAAG packages produce VIF values, whereas the other two produce GVIF values.
Let us have a look at VIF values from the rms and DAAG packages first:
TNAP ICE RegB RegC RegD RegE
1.994 2.195 3.074 3.435 2.907 2.680
TNAP and ICE are continuous predictors and Reg is a categorical variable presented by the dummies RegB to RegE. In this case RegA is the baseline. All VIF values are rather moderate and usually nothing to worry about. The problem with this result is, that it is affected by the baseline of the categorical variable. In order to be sure of not having a VIF value above an acceptable level, it would be necessary to redo this analysis for every level of the categorical variable being the baseline. In this case five times.
Applying the corvif
command from the AED package or vif
command from the car package, GVIF values are produced:
| GVIF | Df | GVIF^(1/2Df) |
TNAP | 1.993964 | 1 | 1.412078 |
ICE | 2.195035 | 1 | 1.481565 |
Reg | 55.511089 | 5 | 1.494301 |
The GVIF is calculated for sets of related regressors such as a for a set of dummy regressors. For the two continuous variables TNAP and ICE this is the same as the VIF values before. For the categorical variable Reg, we now get one very high GVIF value, even though the VIF values for the single levels of the categorical variable were all moderate (as shown above).
However, the interpretation is different. For the two continuous variables, (which is basically the square root of the VIF/GVIF value as DF = 1) is the proportional change of the standard error and confidence interval of their coefficients due to the level of collinearity. The value of the categorical variable is a similar measure for the reduction in precision of the coefficients' estimation due to collinearity (even though not ready for quoting also look at http://socserv2.socsci.mcmaster.ca/jfox/papers/linear-models-problems.pdf).
If we then simply apply the same standard rules of thumb for values as recommended in literature for the VIF, we simply need to square .
Reading through all the forum posts, short notes in the web and scientific papers, it seems that there is quite some confusion going on. In peer reviewed papers, I found the values for ignored and the same standard rules suggested for the VIF are applied to the GVIF values. In another paper, GVIF values of close to 100 are excepted because of a reasonably small (due to a high DF). The rule of is applied in some publications, which would equal a VIF of 4 for one-coefficient variables.
[ASK QUESTION]
at the top & ask it there, then we can help you properly. Since you are new here, you may want to take our tour, which contains information for new users.
Fox & Monette (original citation for GVIF, GVIF^1/2df) suggest taking GVIF to the power of 1/2df makes the value of the GVIF comparable across different number of parameters. "It is analagous to taking the square root of the usual variance-inflation factor" ( from An R and S-Plus Companion to Applied Regression by John Fox). So yes, squaring it and applying the usual VIF "rule of thumb" seems reasonable.