当您正在做某事时,意识到自己在做什么,就会对何时过度拟合模型产生一种感觉。一方面,您可以在模型的“调整后的R平方”中跟踪趋势或劣化。您还可以在主要变量的回归系数的p值中跟踪类似的恶化。
但是,当您阅读其他人的研究并且对他们自己的内部模型开发过程一无所知时,如何清楚地确定模型是否过拟合。
当您正在做某事时,意识到自己在做什么,就会对何时过度拟合模型产生一种感觉。一方面,您可以在模型的“调整后的R平方”中跟踪趋势或劣化。您还可以在主要变量的回归系数的p值中跟踪类似的恶化。
但是,当您阅读其他人的研究并且对他们自己的内部模型开发过程一无所知时,如何清楚地确定模型是否过拟合。
Answers:
当我自己拟合模型时,通常会在拟合过程中使用信息准则,例如AIC或BIC,或者基于最大似然或F 检验对模型进行拟合的似然比检验对基于最小二乘拟合的模型进行。
所有这些在概念上都是相似的,因为它们会惩罚其他参数。他们为添加到模型中的每个新参数设置了“附加说明能力”的阈值。它们都是正规化形式。
对于其他模型,我查看了方法部分,以查看是否使用了此类技术,并且还使用了经验法则,例如每个参数的观察次数-如果每个参数大约有5个(或更少)观察,我开始怀疑。
始终记住,变量在模型中变得重要并不必需“重要”。我可能是一个混杂因素,如果您的目标是估计其他变量的影响,则应该以此为基础。
我建议这是如何报告结果的问题。不是“击败贝叶斯鼓”,而是从贝叶斯角度处理模型不确定性作为推理问题将在这里大有帮助。而且也不必有很大的变化。如果报告仅包含该模型为真的可能性,则将非常有帮助。这是使用BIC估算的容易量。征集的第m个模型的BIC 。然后,假设有M个模型拟合(并且其中一个模型为真),则第m个模型为“真”模型的概率为:
Where is proportional to the prior probability for the jth model. Note that this includes a "penalty" for trying to many models - and the penalty depends on how well the other models fit the data. Usually you will set , however, you may have some "theoretical" models within your class that you would expect to be better prior to seeing any data.
Now if somebody else doesn't report all the BIC's from all the models, then I would attempt to infer the above quantity from what you have been given. Suppose you are given the BIC from the model - note that BIC is calculable from the mean square error of the regression model, so you can always get BIC for the reported model. Now if we take the basic premise that the final model was chosen from the smallest BIC then we have . Now, suppose you were told that "forward" or "forward stepwise" model selection was used, starting from the intercept using potential variables. If the final model is of dimension , then the procedure must have tried at least
different models (exact for forward selection), If the backwards selection was used, then we know at least
Models were tried (the +1 comes from the null model or the full model). Now we could try an be more specific, but these are "minimal" parameters which a standard model selection must satisfy. We could specify a probability model for the number of models tried and the sizes of the - but simply plugging in some values may be useful here anyway. For example suppose that all the BICs were bigger than the one of the model chosen so that , then the probability becomes:
So what this means is that unless is large or is small, the probability will be small also. From an "over-fitting" perspective, this would occur when the BIC for the bigger model is not much bigger than the BIC for the smaller model - a non-neglible term appears in the denominator. Plugging in the backward selection formula for we get:
Now suppose we invert the problem. say and the backward selection gave variables, what would have to be to make the probability of the model greater than some value ? we have
Setting we get - so BIC of the winning model has to win by a lot for the model to be certain.