统计和大数据 percentage

5

以下三个术语有什么区别？百分位分位数四分位数

83 descriptive-statistics quantiles median percentage

10

请参阅Stella Cottrell撰写的“学习技巧手册”（帕拉格雷夫，2012年）第155页的摘录：百分比给出百分比时请注意。假设上面的语句改为： 60％的人更喜欢橘子；40％的人说他们更喜欢苹果。这看起来很有说服力：给出了数量。但是60％和40％之间的差异显着吗？在这里，我们需要知道有多少人被问到。如果要问1000个人中谁喜欢600个橘子，这个数字很有说服力。但是，如果仅询问10个人，则60％的回答仅表示6个人更喜欢橙子。“ 60％”听起来令人信服，而“十分之六”则无法令人信服。作为重要的读者，您需要警惕用于使不足的数据令人印象深刻的百分比。统计学中这种特征是什么？我想了解更多。

41 statistical-significance sample-size percentage

4

是50％100％高于25％还是25％高于25％？

如果我有两个值A和B都表示为C的百分比，并且我想用百分比D表示A和B之间的大小差异，那么将D表示为C的百分比是否更正确？占B（或实际上是A）的百分比？ 50个失业者显然比25个失业者大50％，因为很明显，这里的“％”表示“ 25个失业者中的％”。但是50％的失业率比25％的失业率大多少？这是25％失业率的100％增长，但仅占潜在总失业率的25％。

21 terminology percentage

7

我应该将哪种曲线（或模型）拟合到百分比数据？

我正在尝试创建一个显示病毒拷贝与基因组覆盖率（GCC）之间关系的图。这是我的数据：起初，我只是绘制了线性回归图，但是我的主管告诉我这是不正确的，并尝试使用S形曲线。所以我使用geom_smooth做到了： library(scales) ggplot(scatter_plot_new, aes(x = Copies_per_uL, y = Genome_cov, colour = Virus)) + geom_point() + scale_x_continuous(trans = log10_trans(), breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x))) + geom_smooth(method = "gam", formula = y ~ s(x), se = FALSE, size = 1) + theme_bw() + theme(legend.position = 'top', legend.text …

15 regression modeling curve-fitting percentage

5

如何在大量数据点中进行值的插补？

我的数据集非常大，大约缺少5％的随机值。这些变量相互关联。以下示例R数据集只是一个具有虚拟相关数据的玩具示例。 set.seed(123) # matrix of X variable xmat <- matrix(sample(-1:1, 2000000, replace = TRUE), ncol = 10000) colnames(xmat) <- paste ("M", 1:10000, sep ="") rownames(xmat) <- paste("sample", 1:200, sep = "") #M variables are correlated N <- 2000000*0.05 # 5% random missing values inds <- round ( runif(N, 1, length(xmat)) …

12 r random-forest missing-data data-imputation multiple-imputation large-data definition moving-window self-study categorical-data econometrics standard-error regression-coefficients normal-distribution pdf lognormal regression python scikit-learn interpolation r self-study poisson-distribution chi-squared matlab matrix r modeling multinomial mlogit choice monte-carlo indicator-function r aic garch likelihood r regression repeated-measures simulation multilevel-analysis chi-squared expected-value multinomial yates-correction classification regression self-study repeated-measures references residuals confidence-interval bootstrap normality-assumption resampling entropy cauchy clustering k-means r clustering categorical-data continuous-data r hypothesis-testing nonparametric probability bayesian pdf distributions exponential repeated-measures random-effects-model non-independent regression error regression-to-the-mean correlation group-differences post-hoc neural-networks r time-series t-test p-value normalization probability moments mgf time-series model seasonality r anova generalized-linear-model proportion percentage nonparametric ranks weighted-regression variogram classification neural-networks fuzzy variance dimensionality-reduction confidence-interval proportion z-test r self-study pdf

3

在线性回归中使用百分比结果有什么问题？

我有一项研究，其中许多结果都以百分比表示，并且我正在使用多个线性回归来评估某些类别变量对这些结果的影响。我想知道，由于线性回归假设结果是连续分布，因此将这种模型应用于百分比（限制在0到100之间）是否存在方法上的问题？

11 regression ratio percentage

1

百分率加起来怎么可能呢？

我正在阅读有关鱼菜共生的论文，某些统计数据对所列百分比没有任何意义。哪种方法可以允许这些百分比存在？按百分比计，最常见的水生动物是罗非鱼（69％），观赏鱼（43％），cat鱼（25％），其他水生动物（18％），鲈鱼（16％），蓝g（15％），鳟鱼（ 10％）和低音（7％）。〜http ://www.sciencedirect.com/science/article/pii/S0044848614004724

9 percentage

Questions tagged «percentage»