统计和大数据 regression-coefficients

17

在不包含主要影响的情况下在模型中包括双向交互是否有效？如果您的假设仅是关于相互作用的，那您还需要包括主要影响吗？

85 regression modeling interaction regression-coefficients

1

我如何解释泊松回归中的主要影响（虚拟编码因子的系数）？假设以下示例： treatment <- factor(rep(c(1, 2), c(43, 41)), levels = c(1, 2), labels = c("placebo", "treated")) improved <- factor(rep(c(1, 2, 3, 1, 2, 3), c(29, 7, 7, 13, 7, 21)), levels = c(1, 2, 3), labels = c("none", "some", "marked")) numberofdrugs <- rpois(84, 10) + 1 healthvalue <- rpois(84, 5) …

64 r generalized-linear-model interpretation poisson-distribution regression-coefficients

3

对数转换的预测变量和/或响应的解释

我想知道是否仅对因变量（无论是因变量还是自变量）还是仅对自变量进行了对数转换，在解释上是否有所不同。考虑以下情况 log(DV) = Intercept + B1*IV + Error 我可以将IV解释为百分比增长，但是当我拥有 log(DV) = Intercept + B1*log(IV) + Error 或当我有 DV = Intercept + B1*log(IV) + Error ？

46 regression data-transformation interpretation regression-coefficients logarithm r dataset stata hypothesis-testing contingency-tables hypothesis-testing statistical-significance standard-deviation unbiased-estimator t-distribution r functional-data-analysis maximum-likelihood bootstrap regression change-point regression sas hypothesis-testing bayesian randomness predictive-models nonparametric terminology parametric correlation effect-size loess mean pdf quantile-function bioinformatics regression terminology r-squared pdf maximum multivariate-analysis references data-visualization r pca r mixed-model lme4-nlme distributions probability bayesian prior anova chi-squared binomial generalized-linear-model anova repeated-measures t-test post-hoc clustering variance probability hypothesis-testing references binomial profile-likelihood self-study excel data-transformation skewness distributions statistical-significance econometrics spatial r regression anova spss linear-model

3

简单线性回归中回归系数的导数方差

在简单的线性回归，我们有y=β0+β1x+uy=β0+β1x+uy = \beta_0 + \beta_1 x + u，其中u∼iidN(0,σ2)u∼iidN(0,σ2)u \sim iid\;\mathcal N(0,\sigma^2)。我导出的估计： β1^=∑i(xi−x¯)(yi−y¯)∑i(xi−x¯)2 ,β1^=∑i(xi−x¯)(yi−y¯)∑i(xi−x¯)2 , \hat{\beta_1} = \frac{\sum_i (x_i - \bar{x})(y_i - \bar{y})}{\sum_i (x_i - \bar{x})^2}\ , 其中x¯x¯\bar{x}和y¯y¯\bar{y}是的样本均值xxx和yyy。现在，我想找到的方差β 1。我衍生像下面这样：无功（^ β 1）= σ 2（1 - 1β^1β^1\hat\beta_1Var(β1^)=σ2(1−1n)∑i(xi−x¯)2 .Var(β1^)=σ2(1−1n)∑i(xi−x¯)2 . \text{Var}(\hat{\beta_1}) = \frac{\sigma^2(1 - \frac{1}{n})}{\sum_i (x_i - \bar{x})^2}\ . 推导如下： Var(β1^)=Var(∑i(xi−x¯)(yi−y¯)∑i(xi−x¯)2)=1(∑i(xi−x¯)2)2Var(∑i(xi−x¯)(β0+β1xi+ui−1n∑j(β0+β1xj+uj)))=1(∑i(xi−x¯)2)2Var(β1∑i(xi−x¯)2+∑i(xi−x¯)(ui−∑jujn))=1(∑i(xi−x¯)2)2Var(∑i(xi−x¯)(ui−∑jujn))=1(∑i(xi−x¯)2)2×E⎡⎣⎢⎢⎢⎢⎢⎢⎛⎝⎜⎜⎜⎜⎜∑i(xi−x¯)(ui−∑jujn)−E[∑i(xi−x¯)(ui−∑jujn)]=0⎞⎠⎟⎟⎟⎟⎟2⎤⎦⎥⎥⎥⎥⎥⎥=1(∑i(xi−x¯)2)2E⎡⎣(∑i(xi−x¯)(ui−∑jujn))2⎤⎦=1(∑i(xi−x¯)2)2E[∑i(xi−x¯)2(ui−∑jujn)2] , …

37 regression mathematical-statistics variance linear-model regression-coefficients

4

如何从多项式模型拟合中解释系数？

我正在尝试为我拥有的某些数据创建二阶多项式。假设我通过以下方式绘制了这种拟合ggplot()： ggplot(data, aes(foo, bar)) + geom_point() + geom_smooth(method="lm", formula=y~poly(x, 2)) 我得到：因此，二阶拟合效果很好。我用R计算： summary(lm(data$bar ~ poly(data$foo, 2))) 我得到： lm(formula = data$bar ~ poly(data$foo, 2)) # ... # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) 3.268162 0.008282 394.623 <2e-16 *** # poly(data$foo, 2)1 -0.122391 0.096225 -1.272 0.206 # poly(data$foo, …

36 r regression interpretation regression-coefficients

2

多元回归还是偏相关系数？两者之间的关系

我什至不知道这个问题是否有意义，但是多元回归和部分相关之间有什么区别（除了相关性和回归之间的明显区别之外，这不是我的目标）？我想弄清楚以下几点：我有两个自变量（，）和一个因变量（）。现在，独立变量不再与因变量相关。但是对于给定的当减小时减小。那么，我是否可以通过多元回归或偏相关来分析呢？X 2 ý X 1个 ÿ X 2X1个x1x_1X2x2x_2ÿyyX1个x1x_1 ÿyyX2x2x_2 编辑以希望改善我的问题：我正在尝试了解多元回归和偏相关之间的区别。所以，当对于给定的减小时降低，是由于的组合效果和上（多重回归），或者它是由于去除的效果（部分相关）？x 1 x 2 x 1 x 2 y x 1ÿyyX1个x1x_1X2x2x_2X1个x1x_1X2x2x_2ÿyyX1个x1x_1

35 multiple-regression regression-coefficients partial-correlation

3

R：尽管数据集中没有NaN，随机森林仍在“外部函数调用”错误中抛出NaN / Inf [关闭]

我正在使用插入符号在数据集上运行交叉验证的随机森林。Y变量是一个因素。我的数据集中没有NaN，Inf或NA。但是，当运行随机森林时，我得到 Error in randomForest.default(m, y, ...) : NA/NaN/Inf in foreign function call (arg 1) In addition: There were 28 warnings (use warnings() to see them) Warning messages: 1: In data.matrix(x) : NAs introduced by coercion 2: In data.matrix(x) : NAs introduced by coercion 3: In data.matrix(x) : NAs introduced by …

29 r random-forest caret regression prediction fitting social-science poisson-distribution distributions characteristic-function bayesian prior regression normal-distribution interaction nonparametric skewness svm standard-deviation standard-error regression-coefficients igraph natural-language word2vec word-embeddings regression machine-learning sampling r regression machine-learning random-forest ensemble sampling unbiased-estimator proof estimators mse probability conditional-probability bayes anova missing-data neural-networks recommender-system r confidence-interval sample multiple-imputation r time-series forecasting mase

1

从lmer模型计算效果的可重复性

我刚刚碰到了这篇论文，该论文描述了如何通过混合效应建模来计算测量的可重复性（又称可靠性，又称类内相关性）。R代码为： #fit the model fit = lmer(dv~(1|unit),data=my_data) #obtain the variance estimates vc = VarCorr(fit) residual_var = attr(vc,'sc')^2 intercept_var = attr(vc$id,'stddev')[1]^2 #compute the unadjusted repeatability R = intercept_var/(intercept_var+residual_var) #compute n0, the repeatability adjustment n = as.data.frame(table(my_data$unit)) k = nrow(n) N = sum(n$Freq) n0 = (N-(sum(n$Freq^2)/N))/(k-1) #compute the adjusted repeatability Rn = …

28 mixed-model reliability intraclass-correlation repeatability spss factor-analysis survey modeling cross-validation error curve-fitting mediation correlation clustering sampling machine-learning probability classification metric r project-management optimization svm python dataset quality-control checking clustering distributions anova factor-analysis exponential poisson-distribution generalized-linear-model deviance machine-learning k-nearest-neighbour r hypothesis-testing t-test r variance levenes-test bayesian software bayesian-network regression repeated-measures least-squares change-scores variance chi-squared variance nonlinear-regression regression-coefficients multiple-comparisons p-value r statistical-significance excel sampling sample r distributions interpretation goodness-of-fit normality-assumption probability self-study distributions references theory time-series clustering econometrics binomial hypothesis-testing variance t-test paired-comparisons statistical-significance ab-test r references hypothesis-testing t-test normality-assumption wilcoxon-mann-whitney central-limit-theorem t-test data-visualization interactive-visualization goodness-of-fit

1

自由度可以是非整数吗？

当我使用GAM时，它给了我剩余的DF为（代码的最后一行）。这意味着什么？超越GAM示例，通常，自由度可以是非整数吗？26.626.626.6 > library(gam) > summary(gam(mpg~lo(wt),data=mtcars)) Call: gam(formula = mpg ~ lo(wt), data = mtcars) Deviance Residuals: Min 1Q Median 3Q Max -4.1470 -1.6217 -0.8971 1.2445 6.0516 (Dispersion Parameter for gaussian family taken to be 6.6717) Null Deviance: 1126.047 on 31 degrees of freedom Residual Deviance: 177.4662 on 26.6 degrees of …

27 r degrees-of-freedom gam machine-learning pca lasso probability self-study bootstrap expected-value regression machine-learning linear-model probability simulation random-generation machine-learning distributions svm libsvm classification pca multivariate-analysis feature-selection archaeology r regression dataset simulation r regression time-series forecasting predictive-models r mean sem lavaan machine-learning regularization regression conv-neural-network convolution classification deep-learning conv-neural-network regression categorical-data econometrics r confirmatory-factor scale-invariance self-study unbiased-estimator mse regression residuals sampling random-variable sample probability random-variable convergence r survival weibull references autocorrelation hypothesis-testing distributions correlation regression statistical-significance regression-coefficients univariate categorical-data chi-squared regression machine-learning multiple-regression categorical-data linear-model pca factor-analysis factor-rotation classification scikit-learn logistic p-value regression panel-data multilevel-analysis variance bootstrap bias probability r distributions interquartile time-series hypothesis-testing normal-distribution normality-assumption kurtosis arima panel-data stata clustered-standard-errors machine-learning optimization lasso multivariate-analysis ancova machine-learning cross-validation

3

在解释变量的回归系数时，其顺序是否重要？

起初我以为顺序无关紧要，但是后来我了解了用于计算多个回归系数的gram-schmidt正交化过程，现在我有了第二个想法。根据gram-schmidt过程，在其他变量中索引解释性变量的时间越晚，其残差矢量越小，这是因为从中减去了先前变量的残差矢量。结果，说明变量的回归系数也较小。如果这是真的，那么该变量的残差矢量如果被更早地索引，则将更大，因为将从中减去的残差矢量会更少。这意味着回归系数也将更大。好的，所以我被要求澄清我的问题。因此，我从文本中发布了屏幕截图，这让我一开始很困惑。好的，去。我的理解是，至少有两个选择来计算回归系数。第一个选项在下面的屏幕截图中表示为（3.6）。这是第二个选项（我不得不使用多个屏幕截图）。除非我误读了某些内容（这肯定是可能的），否则在第二种选择中顺序似乎很重要。第一种选择有关系吗？为什么或者为什么不？还是我的参照系太混乱了，甚至不是一个有效的问题？另外，这是否与I型平方和vs II型平方和相关？在此先多谢，我很困惑！

24 regression multiple-regression regression-coefficients

1

有没有一种方法可以使用协方差矩阵来找到用于多元回归的系数？

对于简单的线性回归，可以直接从方差-协方差矩阵CCC， C d ，e计算回归系数。Cd,eCe,eCd,eCe,e C_{d, e}\over C_{e,e} 其中ddd是因变量的指数，和eee是解释变量的指数。如果只有协方差矩阵，是否可以为具有多个解释变量的模型计算系数？ ETA：对于双解释变量，看来和类似地用于β2。我没有立即看到如何将其扩展到三个或更多变量。β1=Cov(y,x1)var(x2)−Cov(y,x2)Cov(x1,x2)var(x1)var(x2)−Cov(x1,x2)2β1=Cov(y,x1)var(x2)−Cov(y,x2)Cov(x1,x2)var(x1)var(x2)−Cov(x1,x2)2\beta_1 = \frac{Cov(y,x_1)var(x_2) - Cov(y,x_2)Cov(x_1,x_2)}{var(x_1)var(x_2) - Cov(x_1,x_2)^2} β2β2\beta_2

23 regression regression-coefficients covariance-matrix

3

多元回归中的“其他所有条件”是什么意思？

当我们这样做多元回归，说我们正在寻找在平均变化在一个变化的变量变量，保存了在其他变量不变，什么值，我们持有的其他变量不变？他们的意思是？零？有什么价值吗？ÿyyXxx 我倾向于认为它具有任何价值。只是在寻求澄清。如果有人有证明，那也将是一件好事。

22 multiple-regression interpretation least-squares regression-coefficients controlling-for-a-variable

2

R中的引导实际上如何工作？

我一直在研究R中的引导程序包，尽管我找到了很多有关如何使用它的入门知识，但我还没有找到任何能够准确描述“幕后”情况的信息。例如，在此示例中，指南显示了如何使用标准回归系数作为引导程序回归的起点，但没有说明引导程序实际上在做什么以得出引导程序回归系数。似乎正在发生某种迭代过程，但我似乎无法弄清楚到底发生了什么。

22 r regression bootstrap regression-coefficients

1

解释堵塞物逻辑回归的估计

有人可以建议我如何使用博客链接解释逻辑回归的估算值吗？我已在中安装以下模型lme4： glm(cbind(dead, live) ~ time + factor(temp) * biomass, data=mussel, family=binomial(link=cloglog)) 例如，时间估计为0.015。说单位时间死亡率的几率乘以exp（0.015）= 1.015113（每单位时间增加〜1.5％）是否正确？换句话说，在loglog中获得的估计值是否与logit logistic回归一样以对数赔率表示？

21 logistic regression-coefficients

4

多元回归中预测变量的重要性：部分与标准化系数

我想知道线性模型中部分与系数之间的确切关系是什么，我是否应该仅使用一个或两个来说明因素的重要性和影响。R2R2R^2 据我所知，summary我得到了系数的估计值，并且得到anova了每个因子的平方和-一个因子的平方和除以平方和加残差的和的比例为（以下代码位于中）。R2R2R^2R library(car) mod<-lm(education~income+young+urban,data=Anscombe) summary(mod) Call: lm(formula = education ~ income + young + urban, data = Anscombe) Residuals: Min 1Q Median 3Q Max -60.240 -15.738 -1.156 15.883 51.380 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.868e+02 6.492e+01 -4.418 5.82e-05 *** income 8.065e-02 9.299e-03 8.674 2.56e-11 *** young 8.173e-01 …

21 r regression multiple-regression regression-coefficients r-squared

Questions tagged «regression-coefficients»