统计和大数据 time-series

3

R：尽管数据集中没有NaN，随机森林仍在“外部函数调用”错误中抛出NaN / Inf [关闭]

我正在使用插入符号在数据集上运行交叉验证的随机森林。Y变量是一个因素。我的数据集中没有NaN，Inf或NA。但是，当运行随机森林时，我得到 Error in randomForest.default(m, y, ...) : NA/NaN/Inf in foreign function call (arg 1) In addition: There were 28 warnings (use warnings() to see them) Warning messages: 1: In data.matrix(x) : NAs introduced by coercion 2: In data.matrix(x) : NAs introduced by coercion 3: In data.matrix(x) : NAs introduced by …

29 r random-forest caret regression prediction fitting social-science poisson-distribution distributions characteristic-function bayesian prior regression normal-distribution interaction nonparametric skewness svm standard-deviation standard-error regression-coefficients igraph natural-language word2vec word-embeddings regression machine-learning sampling r regression machine-learning random-forest ensemble sampling unbiased-estimator proof estimators mse probability conditional-probability bayes anova missing-data neural-networks recommender-system r confidence-interval sample multiple-imputation r time-series forecasting mase

1

R中秒/分钟间隔数据的“频率”值

我正在使用R（3.1.1）和ARIMA模型进行预测。我想知道什么是“频率”参数ts()，如果im使用时间序列数据，则该参数应在函数中分配：以分钟为单位，并持续180天（每天1440分钟）相隔数秒，分布在180天（86,400秒/天）中。如果我没记错的话，R中以ts为单位的“频率”是每个“季节”的观测次数。问题部分1：在我的情况下，“季节”是什么？如果季节是“日”，那么分钟的“频率”是1440，秒是86400？问题第二部分： “频率”是否还取决于我要达到/预测的目标？例如，就我而言，我想要一个非常短期的预测。每次比10分钟领先一步。然后可以将季节视为一个小时而不是一天吗？在那种情况下，频率= 60分钟，而频率= 3600秒？例如，我尝试使用频率= 60作为分钟数据，与频率= 1440相比，得到了更好的结果（用于fourier查看Hyndman的以下链接） http://robjhyndman.com/hyndsight/forecasting-weekly-data/ （使用MAPE进行预测准确性的比较）如果结果完全是任意的，并且无法更改频率。在我的数据上使用freq = 60的实际解释是什么？我也认为值得一提的是，我的数据每隔两个小时包含一次季节性变化（通过观察原始数据和自相关函数）

28 r time-series arima multiple-seasonalities mape

3

认真研究R的时间序列

如果回想一下，可以追溯到首次进行时间序列分析的时间。您希望了解哪些工具，R软件包和Internet资源？我想问的是，应该从哪里开始？具体来说，对于R的时间序列分析“新手”来说，R是否有任何资源可以真正解决它。

28 r time-series

1

从lmer模型计算效果的可重复性

我刚刚碰到了这篇论文，该论文描述了如何通过混合效应建模来计算测量的可重复性（又称可靠性，又称类内相关性）。R代码为： #fit the model fit = lmer(dv~(1|unit),data=my_data) #obtain the variance estimates vc = VarCorr(fit) residual_var = attr(vc,'sc')^2 intercept_var = attr(vc$id,'stddev')[1]^2 #compute the unadjusted repeatability R = intercept_var/(intercept_var+residual_var) #compute n0, the repeatability adjustment n = as.data.frame(table(my_data$unit)) k = nrow(n) N = sum(n$Freq) n0 = (N-(sum(n$Freq^2)/N))/(k-1) #compute the adjusted repeatability Rn = …

28 mixed-model reliability intraclass-correlation repeatability spss factor-analysis survey modeling cross-validation error curve-fitting mediation correlation clustering sampling machine-learning probability classification metric r project-management optimization svm python dataset quality-control checking clustering distributions anova factor-analysis exponential poisson-distribution generalized-linear-model deviance machine-learning k-nearest-neighbour r hypothesis-testing t-test r variance levenes-test bayesian software bayesian-network regression repeated-measures least-squares change-scores variance chi-squared variance nonlinear-regression regression-coefficients multiple-comparisons p-value r statistical-significance excel sampling sample r distributions interpretation goodness-of-fit normality-assumption probability self-study distributions references theory time-series clustering econometrics binomial hypothesis-testing variance t-test paired-comparisons statistical-significance ab-test r references hypothesis-testing t-test normality-assumption wilcoxon-mann-whitney central-limit-theorem t-test data-visualization interactive-visualization goodness-of-fit

5

为什么随机游走的方差会增加？

定义为Y t = Y t − 1 + e t的随机游走，其中e t是白噪声。表示当前位置是前一个位置的总和加上一个不可预测的项。Yt=Yt−1+etÿŤ=ÿŤ-1个+ËŤY_{t} = Y_{t-1} + e_tetete_t 可以证明的是，平均函数μt=0μt=0\mu_t = 0 ，因为E(Yt)=E(e1+e2+...+et)=E(e1)+E(e2)+...+E(et)=0+0+...+0E(Yt)=E(e1+e2+...+et)=E(e1)+E(e2)+...+E(et)=0+0+...+0E(Y_{t}) = E(e_1+ e_2+ ... +e_t) = E(e_1) + E(e_2) +... +E(e_t) = 0 + 0 + ... + 0 但是，为什么方差随时间线性增加？因为新位置与上一个位置非常相关，这是否与“纯”随机无关？编辑：现在，通过可视化大量随机游走，我有了更好的理解，在这里我们可以轻松地观察到总体方差确实会随着时间的推移而增加，平均值在零附近。毕竟这可能是微不足道的，因为在时间序列的早期（比较时间= 10，有100），随机步行者还没有时间去探索。

28 time-series self-study mathematical-statistics stochastic-processes random-walk

1

自由度可以是非整数吗？

当我使用GAM时，它给了我剩余的DF为（代码的最后一行）。这意味着什么？超越GAM示例，通常，自由度可以是非整数吗？26.626.626.6 > library(gam) > summary(gam(mpg~lo(wt),data=mtcars)) Call: gam(formula = mpg ~ lo(wt), data = mtcars) Deviance Residuals: Min 1Q Median 3Q Max -4.1470 -1.6217 -0.8971 1.2445 6.0516 (Dispersion Parameter for gaussian family taken to be 6.6717) Null Deviance: 1126.047 on 31 degrees of freedom Residual Deviance: 177.4662 on 26.6 degrees of …

27 r degrees-of-freedom gam machine-learning pca lasso probability self-study bootstrap expected-value regression machine-learning linear-model probability simulation random-generation machine-learning distributions svm libsvm classification pca multivariate-analysis feature-selection archaeology r regression dataset simulation r regression time-series forecasting predictive-models r mean sem lavaan machine-learning regularization regression conv-neural-network convolution classification deep-learning conv-neural-network regression categorical-data econometrics r confirmatory-factor scale-invariance self-study unbiased-estimator mse regression residuals sampling random-variable sample probability random-variable convergence r survival weibull references autocorrelation hypothesis-testing distributions correlation regression statistical-significance regression-coefficients univariate categorical-data chi-squared regression machine-learning multiple-regression categorical-data linear-model pca factor-analysis factor-rotation classification scikit-learn logistic p-value regression panel-data multilevel-analysis variance bootstrap bias probability r distributions interquartile time-series hypothesis-testing normal-distribution normality-assumption kurtosis arima panel-data stata clustered-standard-errors machine-learning optimization lasso multivariate-analysis ancova machine-learning cross-validation

2

为什么随机行走相互关联？

我已经观察到，平均而言，皮尔逊相关系数的绝对值是一个常数，接近于任何一对独立的随机游动，而与游动长度无关。0.560.42 有人可以解释这种现象吗？我希望相关性会随着步长的增加而减小，就像任何随机序列一样。在我的实验中，我使用步长均值为0且步长标准偏差为1的随机高斯步态。更新：我忘了以数据为中心，这就是为什么它0.56不是的原因0.42。这是计算相关性的Python脚本： import numpy as np from itertools import combinations, accumulate import random def compute(length, count, seed, center=True): random.seed(seed) basis = [] for _i in range(count): walk = np.array(list(accumulate( random.gauss(0, 1) for _j in range(length) ))) if center: walk -= np.mean(walk) basis.append(walk / np.sqrt(np.dot(walk, walk))) …

27 time-series correlation stationarity random-walk

2

ARIMA中的p，d，q值是多少？

在arimaR中的函数中，是什么order(1, 0, 12)意思？什么是可以被分配到的值p，d，q和过程是怎样的去寻找那些价值？

27 r time-series arima

5

时间序列与随机过程相同吗？

随机过程是随时间变化的过程，所以它真的是说“时间序列”的一种更好的方法吗？

27 time-series stochastic-processes definition

2

使用R的时间序列的STL趋势

我是R和时间序列分析的新手。我试图找到较长的（40年）每日温度时间序列的趋势，并尝试采用不同的近似值。第一个只是简单的线性回归，第二个是Loess的时间序列的季节性分解。在后者看来，季节性成分大于趋势。但是，如何量化趋势？我只想说一说这个趋势有多强。 Call: stl(x = tsdata, s.window = "periodic") Time.series components: seasonal trend remainder Min. :-8.482470191 Min. :20.76670 Min. :-11.863290365 1st Qu.:-5.799037090 1st Qu.:22.17939 1st Qu.: -1.661246674 Median :-0.756729578 Median :22.56694 Median : 0.026579468 Mean :-0.005442784 Mean :22.53063 Mean : -0.003716813 3rd Qu.:5.695720249 3rd Qu.:22.91756 3rd Qu.: 1.700826647 Max. :9.919315613 …

27 r time-series trend

4

静态测试和单位根测试有什么区别？

Kwiatkowski–Phillips–Schmidt–Shin（KPSS）测试与增强Dickey-Fuller（ADF）测试之间有什么区别？他们在测试同一件事吗？还是我们需要在不同情况下使用它们？

27 time-series stationarity unit-root augmented-dickey-fuller kpss-test

6

在多个时间序列上估计相同模型

我有时间序列的新手背景（某些ARIMA估计/预测），并且遇到了我不完全了解的问题。任何帮助将不胜感激。我正在分析多个时间序列，这些时间序列都在相同的时间间隔内，并且都在相同的频率下，都描述了相似的数据类型。每个系列只是一个变量，我没有其他对应的预测变量。我被要求估计一个描述所有系列的单一模型-例如，假设我可以找到一个具有相同阶数，系数等的ARIMA（p，d，q），可以适合所有系列。我的主管不希望我单独估计每个系列，也不希望我做某种带有系列之间依存关系的VAR模型。我的问题是：我什至会称这样的模型，我将如何估计/预测呢？如果您更容易使用代码示例，那么我会说SAS和R。

26 time-series

1

如何直观地理解SARIMAX？

我正在尝试阅读有关电力负荷预测的论文，但我在内部概念（特别是SARIMAX模型）中苦苦挣扎。该模型用于预测负载，并使用许多我不了解的统计概念（我是本科计算机科学专业的学生，您可以认为我是统计学的外行）。我没有必要完全了解它是如何工作的，但是我至少想直观地了解正在发生的事情。我一直在尝试将SARIMAX分成较小的部分，并试图分别理解每个部分，然后将它们组合在一起。你们能帮我吗？到目前为止，这就是我所拥有的。我从AR和MA开始。 AR：自回归。我已经了解了回归是什么，并且据我所知，它只是回答了这个问题：给定一组值/点，如何找到一个解释这些值的模型？因此，例如，我们有线性回归，它试图找到一条可以解释所有这些点的线。自回归是一种试图使用先前的值解释值的回归。 MA：移动平均线。我实际上在这里很迷路。我知道什么是移动平均线，但是移动平均线模型似乎与“正常”移动平均线没有任何关系。该模型的公式似乎与AR很尴尬，我似乎无法理解我在互联网上找到的任何概念。MA的目的是什么？MA和AR有什么区别？所以现在有了ARMA。然后，我来自Integrated，据我所知，它仅是为了使ARMA模型具有增加或减少的趋势。（这是否等于说ARIMA允许它是非平稳的？）现在是来自季节性的S，这增加了ARIMA的周期性，例如，在负载预测的情况下，该参数基本上表示每天6 PM的负载看起来非常相似。最后，来自外生变量的X基本上允许在模型中考虑外部变量，例如天气预报。所以我们终于有了SARIMAX！我的解释可以吗？认识到这些解释并不需要严格正确。有人可以直观地解释我的意思吗？

26 regression time-series arima autoregressive intuition

4

在拟合ARIMA模型之前何时记录对时间序列的变换

我以前曾使用Forecast Pro预测单变量时间序列，但将工作流程切换到R。R的预测程序包包含许多有用的功能，但它没有做的一件事是在运行auto之前进行了任何类型的数据转换。 .arima（）。在某些情况下，预测专家决定在进行预测之前记录转换数据，但是我还没有弄清楚为什么。所以我的问题是：在尝试使用ARIMA方法之前，我应该何时对时间序列进行日志转换？ / edit：阅读答案后，我将使用类似x的时间序列： library(lmtest) if ((gqtest(x~1)$p.value < 0.10) { x<-log(x) } 这有意义吗？

26 r time-series data-transformation forecasting arima

3

如何在R中测量时间序列的平滑度？

有没有一种好的方法可以测量R中时间序列的平滑度？例如， -1, -0.8, -0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6, 0.8, 1.0 比...光滑得多 -1, 0.8, -0.6, 0.4, -0.2, 0, 0.2, -0.4, 0.6, -0.8, 1.0 尽管它们具有相同的均值和标准差。如果有一个函数可以在一个时间序列上给我一个平稳的分数，那就太酷了。

25 r time-series

Questions tagged «time-series»