统计和大数据 categorical-encoding

6

采取分类矢量并将其使用一键编码转换为二进制表示形式的运算符的名称是什么？我在想，因为我写的是科学论文，因此需要适当的名称。

10 terminology categorical-encoding

3

我感兴趣的治疗协变量相互作用的实验背景/随机对照试验，用二进制治疗分配指标。ŤTT 根据具体的方法/来源，我分别看到已治疗和未治疗受试者的和。Ť = { 1 ，- 1 }Ť= { 1 ，0 }T={1,0}T=\{1,0\}Ť= { 1 ，− 1 }T={1,−1}T=\{1, -1\} 使用或有什么经验法则吗？{ 1 ，- 1 }{ 1 ，0 }{1,0}\{1,0\}{ 1 ，− 1 }{1,−1}\{1, -1\} 解释有何不同？

10 binary-data categorical-encoding

3

如何在逻辑回归（SPSS）中处理非二元分类变量

我必须使用许多自变量进行二进制逻辑回归。它们大多数是二进制的，但是一些分类变量具有两个以上的级别。处理此类变量的最佳方法是什么？例如，对于一个具有三个可能值的变量，我假设必须创建两个虚拟变量。然后，在逐步回归过程中，最好同时测试两个虚拟变量，或者分别测试它们？我将使用SPSS，但我不太清楚，所以：SPSS如何处理这种情况？此外，对于序数分类变量，使用伪变量重新创建序数标度是一件好事吗？（例如，使用三个虚拟变量为一个4状态定序变量，把0-0-0用于电平，为电平2，为电平3和用于电平4，而不是，，和为4个级别）。1个111-0-02221-1-03331-1-14440-0-01-0-00-1-00-0-1

10 logistic categorical-data spss ordinal-data categorical-encoding

4

如何使用Python统计证明列是否具有分类数据

我在python中有一个数据框，我需要在其中查找所有类别变量。检查列的类型并不总是可行的，因为int类型也可以是分类的。因此，我在寻找正确的假设检验方法以识别列是否为分类方面寻求帮助。我正在尝试进行卡方检验以下的测试，但是我不确定这是否足够好 import numpy as np data = np.random.randint(0,5,100) import scipy.stats as ss ss.chisquare(data) 请指教。

10 hypothesis-testing categorical-data python chi-squared categorical-encoding

1

R线性回归分类变量“隐藏”值

这只是我多次遇到的示例，因此我没有任何示例数据。在R中运行线性回归模型： a.lm = lm(Y ~ x1 + x2) x1是一个连续变量。x2是分类的，具有三个值，例如“低”，“中”和“高”。但是，R给出的输出将类似于： summary(a.lm) Estimate Std. Error t value Pr(>|t|) (Intercept) 0.521 0.20 1.446 0.19 x1 -0.61 0.11 1.451 0.17 x2Low -0.78 0.22 -2.34 0.005 x2Medium -0.56 0.45 -2.34 0.005 我知道R在这种因素（x2是一个因素）上引入了某种虚拟编码。我只是想知道，如何解释x2“高”值？例如，x2在此处给出的示例中，“ High” 对响应变量有什么影响？我在其他地方（例如这里）已经看到了这样的示例，但是还没有找到我能理解的解释。

10 r regression categorical-data regression-coefficients categorical-encoding machine-learning random-forest anova spss r self-study bootstrap monte-carlo r multiple-regression partitioning neural-networks normalization machine-learning svm kernel-trick self-study survival cox-model repeated-measures survey likert correlation variance sampling meta-analysis anova independence sample assumptions bayesian covariance r regression time-series mathematical-statistics graphical-model machine-learning linear-model kernel-trick linear-algebra self-study moments function correlation spss probability confidence-interval sampling mean population r generalized-linear-model prediction offset data-visualization clustering sas cart binning sas logistic causality regression self-study standard-error r distributions r regression time-series multiple-regression python chi-squared independence sample clustering data-mining rapidminer probability stochastic-processes clustering binary-data dimensionality-reduction svd correspondence-analysis data-visualization excel c# hypothesis-testing econometrics survey rating composite regression least-squares mcmc markov-process kullback-leibler convergence predictive-models r regression anova confidence-interval survival cox-model hazard normal-distribution autoregressive mixed-model r mixed-model sas hypothesis-testing mediation interaction

1

为什么model.matrix中的intercept列替换第一个因子？

我正在尝试将我的factor列转换为虚拟变量： str(cards$pointsBin) # Factor w/ 5 levels ".lte100",".lte150",..: 3 2 3 1 4 4 2 2 4 4 ... labels <- model.matrix(~ pointsBin, data=cards) head(labels) # (Intercept) pointsBin.lte150 pointsBin.lte200 pointsBin.lte250 pointsBin.lte300 # 741 1 0 0 0 0 # 407 1 1 0 0 0 # 676 1 0 0 …

9 r categorical-data categorical-encoding

4

如何使用n-1个变量实现虚拟变量？

如果我有一个4级变量，理论上我需要使用3个虚拟变量。实际上，这是如何进行的？我是否使用0-3，我使用1-3并保留4的空白？有什么建议么？注意：我将在R中工作。更新：如果我只使用一列使用1-4对应于AD的列，将会发生什么？这会起作用还是会带来问题？

9 r regression categorical-data categorical-encoding

Questions tagged «categorical-encoding»