统计和大数据 excel

8

看看这个Excel图： “常识”的最佳拟合线将是一条垂直于点中心的几乎垂直的线（用红色手工编辑）。但是，由Excel确定的线性趋势线是所示的对角黑色线。为什么Excel产生了（在人眼中）看来是错误的东西？如何生成看起来更直观的最佳拟合线（即类似红线的东西）？更新1.此处提供带有数据和图形的Excel电子表格：示例数据，Pastebin中的CSV。type1和type2回归技术可以用作excel函数吗？更新2。数据表示滑翔伞在热中攀爬，随风漂移。最终目标是研究风的强度和方向如何随高度变化。我是工程师，而不是数学家或统计学家，因此这些回复中的信息为我提供了更多的研究领域。

82 regression excel intuition

8

Excel作为统计工作台

似乎很多人（包括我在内）都喜欢在Excel中进行探索性数据分析。某些限制（例如，电子表格中允许的行数）是很麻烦的，但是在大多数情况下，并非无法使用Excel来处理数据。但是，McCullough和Heiser撰写的一篇论文却大声尖叫，如果您尝试使用Excel，您将错误地获得所有结果-甚至可能会陷入困境。本文是正确的还是有偏见的？作者听起来确实讨厌微软。

52 software computational-statistics excel

3

对数转换的预测变量和/或响应的解释

我想知道是否仅对因变量（无论是因变量还是自变量）还是仅对自变量进行了对数转换，在解释上是否有所不同。考虑以下情况 log(DV) = Intercept + B1*IV + Error 我可以将IV解释为百分比增长，但是当我拥有 log(DV) = Intercept + B1*log(IV) + Error 或当我有 DV = Intercept + B1*log(IV) + Error ？

46 regression data-transformation interpretation regression-coefficients logarithm r dataset stata hypothesis-testing contingency-tables hypothesis-testing statistical-significance standard-deviation unbiased-estimator t-distribution r functional-data-analysis maximum-likelihood bootstrap regression change-point regression sas hypothesis-testing bayesian randomness predictive-models nonparametric terminology parametric correlation effect-size loess mean pdf quantile-function bioinformatics regression terminology r-squared pdf maximum multivariate-analysis references data-visualization r pca r mixed-model lme4-nlme distributions probability bayesian prior anova chi-squared binomial generalized-linear-model anova repeated-measures t-test post-hoc clustering variance probability hypothesis-testing references binomial profile-likelihood self-study excel data-transformation skewness distributions statistical-significance econometrics spatial r regression anova spss linear-model

5

如何计算加权标准偏差？在Excel中？

所以，我有一个像这样的百分比数据集： 100 / 10000 = 1% (0.01) 2 / 5 = 40% (0.4) 4 / 3 = 133% (1.3) 1000 / 2000 = 50% (0.5) 我想找到百分比的标准偏差，但要为其数据量加权。即，第一个和最后一个数据点应主导计算。我怎么做？有没有一种简单的方法可以在Excel中完成呢？

29 standard-deviation excel weighted-mean

1

从lmer模型计算效果的可重复性

我刚刚碰到了这篇论文，该论文描述了如何通过混合效应建模来计算测量的可重复性（又称可靠性，又称类内相关性）。R代码为： #fit the model fit = lmer(dv~(1|unit),data=my_data) #obtain the variance estimates vc = VarCorr(fit) residual_var = attr(vc,'sc')^2 intercept_var = attr(vc$id,'stddev')[1]^2 #compute the unadjusted repeatability R = intercept_var/(intercept_var+residual_var) #compute n0, the repeatability adjustment n = as.data.frame(table(my_data$unit)) k = nrow(n) N = sum(n$Freq) n0 = (N-(sum(n$Freq^2)/N))/(k-1) #compute the adjusted repeatability Rn = …

28 mixed-model reliability intraclass-correlation repeatability spss factor-analysis survey modeling cross-validation error curve-fitting mediation correlation clustering sampling machine-learning probability classification metric r project-management optimization svm python dataset quality-control checking clustering distributions anova factor-analysis exponential poisson-distribution generalized-linear-model deviance machine-learning k-nearest-neighbour r hypothesis-testing t-test r variance levenes-test bayesian software bayesian-network regression repeated-measures least-squares change-scores variance chi-squared variance nonlinear-regression regression-coefficients multiple-comparisons p-value r statistical-significance excel sampling sample r distributions interpretation goodness-of-fit normality-assumption probability self-study distributions references theory time-series clustering econometrics binomial hypothesis-testing variance t-test paired-comparisons statistical-significance ab-test r references hypothesis-testing t-test normality-assumption wilcoxon-mann-whitney central-limit-theorem t-test data-visualization interactive-visualization goodness-of-fit

3

如何使用Excel执行t检验来检查正态分布？

我想知道如何在Excel中检查数据集的正常性，只是为了验证是否满足使用t检验的要求。对于右尾，是否仅计算平均值和标准偏差，然后从平均值中添加1、2和3标准偏差以创建范围，然后将其与标准正态分布的正态68/95 / 99.7进行比较即可，是否合适？ Excel中的norm.dist函数可测试每个标准偏差值。还是有更好的方法来测试正常性？

21 normal-distribution excel

3

带有两个相连点的行的图的名称是什么？

我一直在阅读EIA报告，该图引起了我的注意。现在，我希望能够创建相同类型的绘图。它显示了两年（1990-2015年）之间的能源生产率演变，并增加了这两个时期之间的变化值。这种情节的名称是什么？如何在Excel中创建同一图（具有不同的国家）？

19 data-visualization terminology excel

4

一个人如何主观排名的结果？

我正在寻找一种可视化主观排名的方法，与我的非参数测试分开。我已经请12名参与者根据不同的主观标准对8个不同的项目进行排名（每个项目分别进行排名）。对于任何单独的排名，我都在寻找一种可视化排名高级趋势的好方法。我已经在平均排名上尝试了条形图和雷达图，而且我看到另一个人在每个排名的响应数上使用了散点图/气球图，但是我不确定是什么传达了最好的概观。我可以使用8个平均排名，也可以使用每个项目的每个排名的8个计数。编辑：例如：每列是一个项目，每行是一个人对八个项目中每个项目的排名。在此示例中，并不是一个特别强的协议，但总的来说，我们希望了解传达总体趋势的最佳方法。 Item: A B C D E F G H Rater: 1 6 8 1 7 3 4 2 5 2 1 3 8 7 6 5 2 4 3 5 8 7 6 1 4 2 3 4 5 8 7 6 4 2 1 …

17 data-visualization nonparametric excel ranking

1

在原假设下，可交换样本背后的直觉是什么？

排列检验（也称为随机检验，重新随机检验或精确检验）非常有用，并且在t-test未满足例如要求的正态分布的假设以及通过按等级对值进行转换时派上用场非参数测试之类的测试Mann-Whitney-U-test会导致丢失更多信息。但是，在使用这种检验时，一个假设且唯一一个假设应该是原假设下样本的可交换性假设。还值得注意的是，当有两个以上的示例（如在coinR包中实现的示例）时，也可以应用这种方法。您能用简单的英语用一些比喻语言或概念直觉来说明这一假设吗？这对于在像我这样的非统计学家中阐明这个被忽视的问题非常有用。注意：提及在相同假设下应用置换测试不成立或无效的情况将非常有帮助。更新：假设我随机从我所在地区的当地诊所收集了50个受试者。他们被随机分配为接受药物或安慰剂的比例为1：1。分别Par1在V1（基准），V2（3个月后）和V3（1年后）时测量了参数1 。根据特征A，所有50个主题都可以分为2组；正值= 20，负值=30。它们也可以基于特征B细分为另外2组；B阳性= 15，B阴性=35。现在，我具有Par1所有访问中所有受试者的值。在可交换性的假设下，如果可以，我是否可以在Par1使用置换测试的水平之间进行比较： -将接受药物治疗的受试者与接受V2安慰剂治疗的受试者进行比较？ -将具有特征A的对象与具有V2的特征B的对象进行比较？ -比较在V2具有特征A的对象与在V3具有特征A的对象？ -在哪种情况下，这种比较是无效的，并且违反了可交换性的假设？

15 hypothesis-testing permutation-test exchangeability r statistical-significance loess data-visualization normal-distribution pdf ggplot2 kernel-smoothing probability self-study expected-value normal-distribution prior correlation time-series regression heteroscedasticity estimation estimators fisher-information data-visualization repeated-measures binary-data panel-data mathematical-statistics coefficient-of-variation normal-distribution order-statistics regression machine-learning one-class probability estimators forecasting prediction validation finance measurement-error variance mean spatial monte-carlo data-visualization boxplot sampling uniform chi-squared goodness-of-fit probability mixture theory gaussian-mixture regression statistical-significance p-value bootstrap regression multicollinearity correlation r poisson-distribution survival regression categorical-data ordinal-data ordered-logit regression interaction time-series machine-learning forecasting cross-validation binomial multiple-comparisons simulation false-discovery-rate r clustering frequency wilcoxon-mann-whitney wilcoxon-signed-rank r svm t-test missing-data excel r numerical-integration r random-variable lme4-nlme mixed-model weighted-regression power-law errors-in-variables machine-learning classification entropy information-theory mutual-information

1

我有一条最适合的路线。我需要的数据点不会改变我的最佳拟合线

我正在做关于装配线的演讲。我有一个简单的线性函数y=1x+by=1x+by=1x+b。我试图获取分散的数据点，然后将其放置在散点图中，以使我的最佳拟合线保持不变。我很想在R或Excel中学习这项技术-以较容易的为准。

15 r regression least-squares excel

3

与PowerPoint一起使用的最佳开源数据可视化软件

什么是最好的开源数据可视化软件？我需要以下条件：可以从Microsoft Excel导入数据（从Oracle数据库导入数据也可以，但这不是强制性的）。该软件生成的图表可以导出到Microsoft PowerPoint（我可以复制和粘贴）。开源且易于使用。

14 data-visualization excel software open-source

1

为什么Excel和WolframAlpha给出不同的偏度值

对于以下3个值222,1122,45444 WolframAlpha给出 0.706 Excel，使用=SKEW(222,1122,45444)给出1.729 是什么解释了差异？

14 excel software descriptive-statistics mathematica

2

如何将频率表转换为值向量？

使用R或Excel，将频率表转换为值向量的最简单方法是什么？例如，您如何转换以下频率表 Value Frequency 1. 2 2. 1 3. 4 4. 2 5. 1 进入以下向量？ 1, 1, 2, 3, 3, 3, 3, 4, 4, 5

13 r dataset excel

1

GBM软件包与使用GBM的插入符

我一直在使用进行模型调整caret，但随后使用该gbm软件包重新运行模型。据我了解，caret程序包使用gbm的输出应相同。然而，data(iris)使用RMSE和R ^ 2作为评估指标，使用进行的快速测试显示模型中的差异约为5％。我想使用来找到最佳模型性能，caret但要重新运行gbm以利用部分依赖图。下面的代码具有可重复性。我的问题是： 1）为什么即使这两个软件包应该相同，我仍会看到这两个软件包之间的差异（我知道它们是随机的，但5％的差异还是很大的，尤其是当我没有使用iris建模时使用的很好的数据集时）。 2）同时使用这两个软件包有什么优点或缺点？ 3）不相关：使用iris数据集时，最佳interaction.depth值为5，但高于我所阅读的最大值，使用最大值floor(sqrt(ncol(iris)))为2。这是严格的经验法则还是非常灵活？ library(caret) library(gbm) library(hydroGOF) library(Metrics) data(iris) # Using caret caretGrid <- expand.grid(interaction.depth=c(1, 3, 5), n.trees = (0:50)*50, shrinkage=c(0.01, 0.001), n.minobsinnode=10) metric <- "RMSE" trainControl <- trainControl(method="cv", number=10) set.seed(99) gbm.caret <- train(Sepal.Length ~ ., data=iris, distribution="gaussian", method="gbm", trControl=trainControl, verbose=FALSE, tuneGrid=caretGrid, metric=metric, bag.fraction=0.75) print(gbm.caret) # …

13 r caret gbm matrix linear-algebra logistic modeling logit ordered-logit r confidence-interval survival population weibull classification separation hypothesis-testing correlation statistical-significance p-value python r data-visualization r regression multiple-regression chi-squared multivariate-analysis distributions random-variable experiment-design distributions poisson-regression residuals excel time-series garch var survival modeling cox-model interaction r pca normality-assumption

2

R vs.Excel中的自相关公式

我试图弄清楚R如何计算滞后k自相关（显然，它与Minitab和SAS使用的公式相同），以便可以将其与使用适用于该系列及其k滞后版本的Excel CORREL函数进行比较。R和Excel（使用CORREL）给出的自相关值略有不同。我也想知道一种计算是否比另一种更正确。

13 r sas autocorrelation excel

Questions tagged «excel»