交叉验证泊松模型的误差度量


29

我正在交叉验证试图预测计数的模型。如果这是二进制分类问题,那么我将计算出不匹配的AUC,如果这是回归问题,则将计算出不匹配的RMSE或MAE。

对于Poisson模型,我可以使用哪些误差度量来评估样本外预测的“准确性”?是否存在AUC的Poisson扩展,可以查看预测对实际值的排序程度?

似乎很多Kaggle竞赛都在使用根均方根平方误差或RMLSE来进行计数(例如,一次yelp审查将获得的有用票数或患者在医院花费的天数)。


/编辑:我一直在做的一件事是计算预测值的十分之一,然后查看实际计数,并按分位数进行分组。如果十分位数1低,十分位数10高且两者之间的十分位数都在增加,则我一直将该模型称为“好”,但是我一直难以量化此过程,并且我相信会有更好的方法方法。

/编辑2:我正在寻找一个公式,该公式采用预测值和实际值并返回一些“错误”或“准确性”指标。我的计划是在交叉验证过程中根据折叠数据计算此函数,然后将其用于比较各种模型(例如,泊松回归,随机森林和GBM)。

例如,一个这样的函数是RMSE = sqrt(mean((predicted-actual)^2))。另一个这样的功能是AUC。这两个函数似乎都不适合泊松数据。


对于泊松模型,您可以使用偏差。这类似于MSE,但更适合于Poisson。如果样本数量不小,则加权MSE将会非常相似。
Glen_b-恢复莫妮卡

@Glen_b偏差的公式是什么?
Zach

1
偏差。您如何拟合泊松模型?
Glen_b-恢复莫妮卡

几种不同的方法,从惩罚性泊松回归到gbm。我正在寻找一个好的误差指标来比较不同的模型。谢谢你的建议。
Zach

泊松回归至少应自动给你一个偏差
Glen_b -Reinstate莫妮卡

Answers:


37

对于可以使用的计数数据,有几个适当的和严格适当的评分规则。计分规则是引入的惩罚其中P为预测分布,y为观察值。它们具有许多理想的属性,首先,最重要的是,更接近真实概率的预测将始终受到较少的惩罚,并且存在(唯一)最佳预测,而当预测概率与真实概率一致时就是一个预测。因此,将s y P 的期望值最小化意味着报告真实概率。另请参阅Wikipedias(y,P)Pys(y,P)

通常,将所有预测值的平均值作为

S=1ni=1ns(y(i),P(i))

采取哪个规则取决于您的目标,但是当每个规则都可以使用时,我将给出一个粗略的描述。

f(y)Pr(Y=y)F(y)k0,1,,Iμσ

严格正确的评分规则

  • 石蜡分数s(y,P)=2f(y)+kf2(k) (stable for size imbalance in categorical predictors)
  • Dawid-Sebastiani score: s(y,P)=(yμσ)2+2logσ (good for general predictive model choice; stable for size imbalance in categorical predictors)
  • Deviance score: s(y,P)=2logf(y)+gy (gy is a normalization term that only depends on y, in Poisson models it is usually taken as the saturated deviance; good for use with estimates from an ML framework)
  • Logarithmic score: s(y,P)=logf(y) (very easily calculated; stable for size imbalance in categorical predictors)
  • Ranked probability score: s(y,P)=k{F(k)I(yk)}2 (good for contrasting different predictions of very high counts; susceptible to size imbalance in categorical predictors)
  • Spherical score: s(y,P)=f(y)kf2(k) (stable for size imbalance in categorical predictors)

Other scoring rules (not so proper but often used)

  • Absolute error score: s(y,P)=|yμ| (not proper)
  • Squared error score: s(y,P)=(yμ)2 (not strictly proper; susceptible to outliers; susceptible to size imbalance in categorical predictors)
  • Pearson normalized squared error score: s(y,P)=(yμσ)2 (not strictly proper; susceptible to outliers; can be used for checking if model checking if the averaged score is very different from 1; stable for size imbalance in categorical predictors)

Example R code for the strictly proper rules:

library(vcdExtra)
m1 <- glm(Freq ~ mental, family=poisson, data=Mental) 

# scores for the first observation
mu <- predict(m1, type="response")[1]
x  <- Mental$Freq[1]

# logarithmic (equivalent to deviance score up to a constant) 
-log(dpois(x, lambda=mu))

# quadratic (brier)
-2*dpois(x,lambda=mu) + sapply(mu, function(x){ sum(dpois(1:1000,lambda=x)^2) })

# spherical
- dpois(x,mu) / sqrt(sapply(mu, function(x){ sum(dpois(1:1000,lambda=x)^2) }))

# ranked probability score
sum(ppois((-1):(x-1), mu)^2) + sum((ppois(x:10000,mu)-1)^2)

# Dawid Sebastiani
(x-mu)^2/mu + log(mu)

@Momo, it's a old thread but very good and useful. Question however about the logarithmic score. You used function -log(f(y)). Is the - sign really should be there? In your scoring rule wikipedia link (en.wikipedia.org/wiki/Scoring_rule#Logarithmic_scoring_rule), the logarithmic score as no negative sign: L(r,i)=ln(ri) is it normal? Finally, in that case is a higher score better or worst?
Bastien

Is it better (or at least more conservative and more realistic) to calculate these measures on a validation data-set that wasn't a part of the data used for estimating the models?
Fred

Given that GLMs are fit using iteratively reweighted least squares, as in bwlewis.github.io/GLM, what would be the objection actually of calculating a weighted R2 on the GLM link scale, using 1/variance weights as weights (which glm gives back in the slot weights in a glm fit)? This would also work for a Poisson glm right?
Tom Wenseleers

See stats.stackexchange.com/questions/412580/… for a reproducible example...
Tom Wenseleers
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.