还要考虑哪种比例最适合您的用例。假设您要进行逻辑检验以进行逻辑回归建模,并希望可视化连续预测变量以确定是否需要在模型中添加样条或多项式项。在这种情况下,您可能需要对数比例而不是概率/比例。
下面的要点中的函数使用一些有限的启发式方法将连续预测变量拆分为bin,计算平均比例,转换为对数奇数,然后绘制geom_smooth
这些合计点。
如果协变量与二进制目标的对数奇数具有二次关系(+噪声),则此图表的示例:
devtools::source_gist("https://gist.github.com/brshallo/3ccb8e12a3519b05ec41ca93500aa4b3")
# simulated dataset with quadratic relationship between x and y
set.seed(12)
samp_size <- 1000
simulated_df <- tibble(x = rlogis(samp_size),
y_odds = 0.2*x^2,
y_probs = exp(y_odds)/(1 + exp(y_odds))) %>%
mutate(y = rbinom(samp_size, 1, prob = y_probs))
# looking at on balanced dataset
simulated_df_balanced <- simulated_df %>%
group_by(y) %>%
sample_n(table(simulated_df$y) %>% min())
ggplot_continuous_binary(df = simulated_df,
covariate = x,
response = y,
snip_scales = TRUE)
#> [1] "bin size: 18"
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
由reprex软件包(v0.2.1)创建于2019-02-06
为了进行比较,如果仅绘制1/0并添加a,则二次关系将是这样geom_smooth
:
simulated_df %>%
ggplot(aes(x, y))+
geom_smooth()+
geom_jitter(height = 0.01, width = 0)+
coord_cartesian(ylim = c(0, 1), xlim = c(-3.76, 3.59))
# set xlim to be generally consistent with prior chart
#> `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
由reprex软件包(v0.2.1)创建于2019-02-25
与logit的关系不太清楚,使用geom_smooth
存在一些问题。