在时间序列的Ljung-Box测试中要使用多少个滞后？

20

在将ARMA模型拟合到时间序列后，通常通过Ljung-Box portmanteau测试（以及其他测试）来检查残差。Ljung-Box测试返回ap值。它有一个参数h，它是要测试的延迟数。有些文字建议使用h = 20；其他人建议使用h = ln（n）; 大多数人并不说什么^ h使用。

而不是对h使用单个值，假设我对所有<50的h做Ljung-Box测试，然后选择h给出最小p值。这种方法合理吗？优点和缺点是什么？（一个明显的缺点是增加了计算时间，但这在这里不是问题。）是否有关于此的文献？

略作详细说明。...如果测试对所有h都给出p> 0.05 ，则显然时间序列（残差）通过了测试。我的问题涉及在h的某些值而不是其他值的情况下p <0.05时如何解释检验。

time-series

— 用户名
source

1

@ user2875，我已删除答案。事实是，对于很大的

h

$h$ ，测试是不可靠的。因此，答案实际上取决于哪个

h

$h$ ，

p < 0.05

$p<0.05$ 。此外，

确切值是

p

$p$ 多少？如果我们将阈值降低到

0.01

$0.01$ ，测试结果是否会改变？就个人而言，如果假设相互矛盾，无论模型是否良好，我都会寻找其他指标。模型拟合得如何？该模型与替代模型相比如何？替代模型是否存在相同的问题？对于其他哪些违规行为，测试将拒绝null？

— mpiktas 2011年

1

@ mpiktas，Ljung-Box检验基于一个统计量，该统计量的分布呈渐近（随着h变大）卡方。但是，当h相对于n变大时，检验的功效会降低至0。因此，希望选择足够大的h以使分布接近卡方，但又要足够小以具有有用的功效。（当h小时，我不知道假阴性的风险是什么。）

— user2875 2011年

@ user2875，这是您第三次更改问题。首先，您要问选择最小的

的策略，然后是对于

某些值，如果

该如何解释测试的问题，那么现在要选择的最佳

是什么。这三个问题都有不同的答案，甚至可能根据特定问题的上下文也有不同的答案。

h

$h$

p < 0.05

$p<0.05$

h

$h$

h

$h$

— mpiktas，2011年

@mpiktas，问题都是一样的，只是看问题的方式不同。（如所指出的，如果所有h的p> 0.05，那么我们知道如何解释最小的p;如果我们知道最优的h-我们不知道-那么我们就不必担心选择最小的p。）

— user2875 2011年

9

答案绝对取决于：检验实际上是为了什么？ $Q$

常见的原因是：对直到滞后都没有自相关的零假设的联合统计显着性或多或少有信心（或者假设您有接近于弱白噪声的东西），并建立一个简约模型，该模型几乎没有尽可能多的参数。 $h$

通常，时间序列数据具有自然的季节性模式，因此实际的经验法则是将设置为该值的两倍。如果将模型用于预测需求，则另一个是预测范围。最后，如果您发现后期滞后有明显偏离，请尝试考虑更正（可能是由于某些季节影响或未针对异常值对数据进行修正）。 $h$

而不是对h使用单个值，而是假设我对所有h <50都进行了Ljung-Box测试，然后选择给出最小p值的h。

这是一个联合显着性检验，因此，如果的选择是由数据驱动的，那么为什么我会在小于任何滞后情况下考虑一些小的（偶然的）偏离，假设它当然比小得多（幂您提到的测试）。寻求找到一个简单但相关的模型，我建议如下所述的信息标准。 $h$ $h$ $n$

我的问题涉及在某些值而不是其他值的情况下，如果，该如何解释测试。 $p<0.05$ $h$

因此，这取决于离现在有多远。远距的缺点：要估计的参数更多，自由度更少，模型的预测能力更差。

尝试估计模型，其中包括发生偏离的滞后时间的MA和/或AR部分，并另外查看一种信息标准（取决于样本量的AIC或BIC），这将带给您更多关于哪种模型更好的见解简直是这里也欢迎任何样本外的预测练习。

— 德米特里·塞洛夫（Dmitrij Celov）
source

+1，这是我要表达的内容，但未能表达出来：)

— mpiktas 2011年

8

假设我们指定了一个具有所有常规属性的简单AR（1）模型，

ÿ_{Ť} = β ÿ_{Ť - 1个} + ü_{Ť}

$y_t = \beta y_{t-1} + u_t$

将误差项的理论协方差表示为

γ_{j} \equiv E (u_{t} u_{t - j})

$\gamma_j \equiv E(u_tu_{t-j})$

如果我们可以观察到误差项，则误差项的样本自相关定义为

{\tilde{ρ}}_{j} \equiv \frac{{\tilde{γ}}_{j}}{{\tilde{γ}}_{0}}

$\tilde \rho_j \equiv \frac {\tilde \gamma_j}{\tilde \gamma_0}$

哪里

{\tilde{γ}}_{j} \equiv \frac{1}{n} \sum_{t = j + 1}^{n} u_{t} u_{t - j}, j = 0, 1, 2...

$\tilde\gamma_j \equiv \frac 1n \sum_{t=j+1}^nu_tu_{t-j},\;\;\; j=0,1,2...$

但是实际上，我们不会观察到误差项。因此，将使用估计中的残差来估计与误差项相关的样本自相关，如下

{\hat{γ}}_{j} \equiv \frac{1}{n} \sum_{t = j + 1}^{n} {\hat{u}}_{t} {\hat{u}}_{t - j}, j = 0, 1, 2...

$\hat\gamma_j \equiv \frac 1n \sum_{t=j+1}^n\hat u_t\hat u_{t-j},\;\;\; j=0,1,2...$

Box-Pierce Q统计量（Ljung-Box Q只是它的一个渐近中立的缩放版本）是

Q_{B P} = n \sum_{j = 1}^{p} {\hat{ρ}}_{j}^{2} = \sum_{j = 1}^{p} [\sqrt{n} {\hat{ρ}}_{j}]^{2} \overset{d}{\to} ? ? ? χ^{2} (p)

$Q_{BP} = n \sum_{j=1}^p\hat\rho^2_j = \sum_{j=1}^p[\sqrt n\hat\rho_j]^2\xrightarrow{d} \;???\;\chi^2(p)$

我们的问题恰恰是在该模型中是否可以说渐近地具有卡方分布（在误差项中为非自相关的空值下）。为此，每个 $Q_{BP}$
必须是渐进标准正态分布。一种检查方法是检查是否 $\sqrt n \hat\rho_j$ 具有相同的渐近分布为 $\sqrt n \hat\rho$ $\sqrt n \tilde\rho$ （它是使用真误差构造等方面具有空下所需的渐近行为）。

我们有

{\hat{u}}_{t} = y_{t} - \hat{β} y_{t - 1} = u_{t} - (\hat{β} - β) y_{t - 1}

$\hat u_t = y_t - \hat \beta y_{t-1} = u_t - (\hat \beta - \beta)y_{t-1}$

其中，是一致的估计。所以 $\hat \beta$

{\hat{γ}}_{j} \equiv \frac{1}{n} \sum_{t = j + 1}^{n} [u_{t} - (\hat{β} - β) y_{t - 1}] [u_{t - j} - (\hat{β} - β) y_{t - j - 1}]

$\hat\gamma_j \equiv \frac 1n \sum_{t=j+1}^n[u_t - (\hat \beta - \beta)y_{t-1}][u_{t-j} - (\hat \beta - \beta)y_{t-j-1}]$

= {\tilde{γ}}_{j} - \frac{1}{n} \sum_{t = j + 1}^{n} (\hat{β} - β) [u_{t} y_{t - j - 1} + u_{t - j} y_{t - 1}] + \frac{1}{n} \sum_{t = j + 1}^{n} (\hat{β} - β)^{2} y_{t - 1} y_{t - j - 1}

$=\tilde \gamma _j -\frac 1n \sum_{t=j+1}^n (\hat \beta - \beta)\big[u_ty_{t-j-1} +u_{t-j}y_{t-1}\big] + \frac 1n \sum_{t=j+1}^n(\hat \beta - \beta)^2y_{t-1}y_{t-j-1}$

The sample is assumed to be stationary and ergodic, and moments are assumed to exist up until the desired order. Since the estimator $\hat \beta$ is consistent, this is enough for the two sums to go to zero. So we conclude

{\hat{γ}}_{j} \overset{p}{\to} {\tilde{γ}}_{j}

$\hat \gamma_j \xrightarrow{p} \tilde \gamma_j$

This implies that

{\hat{ρ}}_{j} \overset{p}{\to} {\tilde{ρ}}_{j} \overset{p}{\to} ρ_{j}

$\hat \rho_j \xrightarrow{p} \tilde \rho_j \xrightarrow{p} \rho_j$

But this does not automatically guarantee that $\sqrt n \hat \rho_j$ converges to $\sqrt n\tilde \rho_j$ (in distribution) (think that the continuous mapping theorem does not apply here because the transformation applied to the random variables depends on $n$ ). In order for this to happen, we need

\sqrt{n} {\hat{γ}}_{j} \overset{d}{\to} \sqrt{n} {\tilde{γ}}_{j}

$\sqrt n \hat \gamma_j \xrightarrow{d} \sqrt n \tilde \gamma_j$

(the denominator $\gamma_0$ -tilde or hat- will converge to the variance of the error term in both cases, so it is neutral to our issue).

We have

\sqrt{n} {\hat{γ}}_{j} = \sqrt{n} {\tilde{γ}}_{j} - \frac{1}{n} \sum_{t = j + 1}^{n} \sqrt{n} (\hat{β} - β) [u_{t} y_{t - j - 1} + u_{t - j} y_{t - 1}] + \frac{1}{n} \sum_{t = j + 1}^{n} \sqrt{n} (\hat{β} - β)^{2} y_{t - 1} y_{t - j - 1}

$\sqrt n \hat \gamma_j =\sqrt n\tilde \gamma _j -\frac 1n \sum_{t=j+1}^n \sqrt n(\hat \beta - \beta)\big[u_ty_{t-j-1} +u_{t-j}y_{t-1}\big] \\+ \frac 1n \sum_{t=j+1}^n\sqrt n(\hat \beta - \beta)^2y_{t-1}y_{t-j-1}$

So the question is : do these two sums, multiplied now by $\sqrt n$ , go to zero in probability so that we will be left with $\sqrt n \hat \gamma_j =\sqrt n\tilde \gamma _j$ asymptotically?

For the second sum we have

\frac{1}{n} \sum_{t = j + 1}^{n} \sqrt{n} (\hat{β} - β)^{2} y_{t - 1} y_{t - j - 1} = \frac{1}{n} \sum_{t = j + 1}^{n} [\sqrt{n} (\hat{β} - β)] [(\hat{β} - β) y_{t - 1} y_{t - j - 1}]

$\frac 1n \sum_{t=j+1}^n\sqrt n(\hat \beta - \beta)^2y_{t-1}y_{t-j-1} = \frac 1n \sum_{t=j+1}^n\big[\sqrt n(\hat \beta - \beta)][(\hat \beta - \beta)y_{t-1}y_{t-j-1}]$

Since $[\sqrt n(\hat \beta - \beta)]$ converges to a random variable, and $\hat \beta$ is consistent, this will go to zero.

For the first sum, here too we have that $[\sqrt n(\hat \beta - \beta)]$ converges to a random variable, and so we have that

\frac{1}{n} \sum_{t = j + 1}^{n} [u_{t} y_{t - j - 1} + u_{t - j} y_{t - 1}] \overset{p}{\to} E [u_{t} y_{t - j - 1}] + E [u_{t - j} y_{t - 1}]

$\frac 1n \sum_{t=j+1}^n \big[u_ty_{t-j-1} +u_{t-j}y_{t-1}\big] \xrightarrow{p} E[u_ty_{t-j-1}] + E[u_{t-j}y_{t-1}]$

The first expected value, $E[u_ty_{t-j-1}]$ is zero by the assumptions of the standard AR(1) model. But the second expected value is not, since the dependent variable depends on past errors.

So $\sqrt n\hat \rho_j$ won't have the same asymptotic distribution as $\sqrt n\tilde \rho_j$ . But the asymptotic distribution of the latter is standard Normal, which is the one leading to a chi-squared distribution when squaring the r.v.

Therefore we conclude, that in a pure time series model, the Box-Pierce Q and the Ljung-Box Q statistic cannot be said to have an asymptotic chi-square distribution, so the test loses its asymptotic justification.

This happens because the right-hand side variable (here the lag of the dependent variable) by design is not strictly exogenous to the error term, and we have found that such strict exogeneity is required for the BP/LB Q-statistic to have the postulated asymptotic distribution.

Here the right-hand-side variable is only "predetermined", and the Breusch-Pagan test is then valid. (for the full set of conditions required for an asymptotically valid test, see Hayashi 2000, p. 146-149).

— Alecos Papadopoulos
source

1

You wrote "But the second expected value is not, since the dependent variable depends on past errors." That's called strict exogeneity. I agree that it's a strong assumption, and you can build AR(p) framework without it, just by using weak exogeneity. This the reason why Breusch-Godfrey test is better in some sense: if the null is not true, then B-L loses power. B-G is based on weak exogeneity. Both tests are not good for some common econometric, applications, see e.g. this Stata's presentation, p. 4/44.

— 阿克萨卡尔邦

3

@Aksakal Thanks for the reference. The point exactly is that without strict exogeneity, the Box-Pierce/Ljung-Box do not have an asymptotic chi-square distribution, this is what the mathematics above show. Weak exogeneity (which holds in the above model) is not enough for them. This is exactly what the presentation you link to says in p. 3/44.

— Alecos Papadopoulos

2

@AlecosPapadopoulos, an amazing post!!! Among the few best ones I have encountered here at Cross Validated. I just wish it would not disappear in this long thread and many users would find and benefit from it in the future.

— Richard Hardy

3

Before you zero-in on the "right" h (which appears to be more of an opinion than a hard rule), make sure the "lag" is correctly defined.

http://www.stat.pitt.edu/stoffer/tsa2/Rissues.htm

Quoting the section below Issue 4 in the above link:

"....The p-values shown for the Ljung-Box statistic plot are incorrect because the degrees of freedom used to calculate the p-values are lag instead of lag - (p+q). That is, the procedure being used does NOT take into account the fact that the residuals are from a fitted model. And YES, at least one R core developer knows this...."

Edit (01/23/2011): Here's an article by Burns that might help:

http://lib.stat.cmu.edu/S/Spoetry/Working/ljungbox.pdf

— bill_080
source

@bil_080, the OP does not mention R, and help page for Box.test in R mentions the correction and has an argument to allow for the correction, although you need to supply it manualy.

— mpiktas

@mpiktas, Oops, you're right. I assumed this was an R question. As for the second part of your comment, there are several R packages that use Ljung-Box stats. So, it's a good idea to make sure the user understands what the package's "lag" means.

— bill_080

Thanks--I am using R, but the question is a general one. Just to be safe, I was doing the test with the LjungBox function in the portes package, as well as Box.test.

— user2875

2

The thread "Testing for autocorrelation: Ljung-Box versus Breusch-Godfrey" shows that the Ljung-Box test is essentially inapplicable in the case of an autoregressive model. It also shows that Breusch-Godfrey test should be used instead. That limits the relevance of your question and the answers (although the answers may include some generally good points).

— Richard Hardy
source

The trouble with LB test is when autoregressive models have other regressors, i.e. ARMAX not ARM models. OP explicitly states ARMA not ARMAX in the question. Hence, I think that your answer is incorrect.

— Aksakal

@Aksakal, I clearly see from Alecos Papadopoulos answer (and comments under it) in the above-mentioned thread that Ljung-Box test is inapplicable in both cases, i.e. pure AR/ARMA and ARX/ARMAX. Therefore, I cannot agree with you.

— 理查德·哈迪

Alecos Papadopoulos's answer is good, but incomplete. It points out to Ljung-Box test's assumption of strict exogeneity but it fails to mention that if you're fine with the assumption, then L-B test is Ok to use. B-G test, which he and I favor over L-B, relies on weak exogeneity. It's better to use tests with weaker assumptions in general, of course. However, even B-G test's assumptions are too strong in many cases.

— Aksakal

@Aksakal, The setting of this question is quite definite -- it considers residuals from an ARMA model. The important thing here is, L-B does not work (as shown explicitly in Alecos post in this as well as the above-cited thread) while B-G test does work. Of course, things can happen in other settings (even B-G test's assumptions are too strong in many cases) -- but that is not the concern in this thread. Also, I did not get what the assumption is in your statement if you're fine with the assumption, then L-B test is Ok to use. Is that supposed to invalidate Alecos point?

— Richard Hardy

1

Escanciano and Lobato constructed a portmanteau test with automatic, data-driven lag selection based on the Pierce-Box test and its refinements (which include the Ljung-Box test).

The gist of their approach is to combine the AIC and BIC criteria --- common in the identification and estimation of ARMA models --- to select the optimal number of lags to be used. In the introduction of they suggest that, intuitively, ``test conducted using the BIC criterion are able to properly control for type I error and are more powerful when serial correlation is present in the first order''. Instead, tests based on AIC are more powerful against high order serial correlation. Their procedure thus choses a BIC-type lag selection in the case that autocorrelations seem to be small and present only at low order, and an AIC-type lag section otherwise.

The test is implemented in the R package vrtest (see function Auto.Q).

— Ryogi
source

1

The two most common settings are $\min(20,T-1)$ and $\ln T$ where $T$ is the length of the series, as you correctly noted.

The first one is supposed to be from the authorative book by Box, Jenkins, and Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.. However, here's all they say about the lags on p.314:

It's not a strong argument or suggestion by any means, yet people keep repeating it from one place to another.

The second setting for a lag is from Tsay, R. S. Analysis of Financial Time Series. 2nd Ed. Hoboken, NJ: John Wiley & Sons, Inc., 2005, here's what he wrote on p.33:

Several values of m are often used. Simulation studies suggest that the choice of m ≈ ln(T ) provides better power performance.

This is a somewhat stronger argument, but there's no description of what kind of study was done. So, I wouldn't take it at a face value. He also warns about seasonality:

This general rule needs modification in analysis of seasonal time series for which autocorrelations with lags at multiples of the seasonality are more important.

Summarizing, if you just need to plug some lag into the test and move on, then you can use either of these setting, and that's fine, because that's what most practitioners do. We're either lazy or, more likely, don't have time for this stuff. Otherwise, you'd have to conduct your own research on the power and properties of the statistics for series that you deal with.

UPDATE.

Here's my answer to Richard Hardy's comment and his answer, which refers to another thread on CV started by him. You can see that the exposition in the accepted (by Richerd Hardy himself) answer in that thread is clearly based on ARMAX model, i.e. the model with exogenous regressors $x_t$ :

y_{t} = x_{t}^{'} β + ϕ (L) y_{t} + u_{t}

$y_t = \mathbf x_t'\beta + \phi(L)y_t + u_t$

However, OP did not indicate that he's doing ARMAX, to contrary, he explicitly mentions ARMA:

After an ARMA model is fit to a time series, it is common to check the residuals via the Ljung-Box portmanteau test

One of the first papers that pointed to a potential issue with LB test was Dezhbaksh, Hashem (1990). “The Inappropriate Use of Serial Correlation Tests in Dynamic Linear Models,” Review of Economics and Statistics, 72, 126–132. Here's the excerpt from the paper:

As you can see, he doesn't object to using LB test for pure time series models such as ARMA. See also the discussion in the manual to a standard econometrics tool EViews:

If the series represents the residuals from ARIMA estimation, the appropriate degrees of freedom should be adjusted to represent the number of autocorrelations less the number of AR and MA terms previously estimated. Note also that some care should be taken in interpreting the results of a Ljung-Box test applied to the residuals from an ARMAX specification (see Dezhbaksh, 1990, for simulation evidence on the finite sample performance of the test in this setting)

Yes, you have to be careful with ARMAX models and LB test, but you can't make a blanket statement that LB test is always wrong for all autoregressive series.

UPDATE 2

Alecos Papadopoulos's answer shows why Ljung-Box test requires strict exogeneity assumption. He doesn't show it in his post, but Breusch-Gpdfrey test (another alternative test) requires only weak exogeneity, which is better, of course. This what Greene, Econometrics, 7th ed. says on the differences between tests, p.923:

The essential difference between the Godfrey–Breusch and the Box–Pierce tests is the use of partial correlations (controlling for X and the other variables) in the former and simple correlations in the latter. Under the null hypothesis, there is no autocorrelation in εt , and no correlation between $x_t$ and $\varepsilon_s$ in any event, so the two tests are asymptotically equivalent. On the other hand, because it does not condition on $x_t$ ，直觉可能表明，当原假设为假时，Box-Pierce检验的功能不如LM检验。

— 阿克萨卡尔族
source

我想您决定回答这个问题，因为我最近的回答将其推到了活动线程的顶部。奇怪的是，我认为测试在所考虑的环境中是不合适的，从而使整个线程成为问题，并且答案尤其如此。您是否认为发布另一个答案而不考虑该问题而忽略该问题（就像所有先前的答案一样）是一种好习惯吗？还是您认为我的答案没有意义（这足以证明像您一样发布答案）？

— 理查德·哈迪

Thank you for an update! I am not an expert, but the argumentation by Alecos Papadopoulos in "Testing for autocorrelation: Ljung-Box versus Breusch-Godfrey" and in the comments under his answer suggests that Ljung-Box is indeed inapplicable on residuals from pure ARMA (as well as ARMAX) models. If the wording is confusing, check the maths there, it seems fine. I think this is a very interesting and important question, so I would really like to find agreement between all of us here.

— Richard Hardy

0

... h应该尽可能小，以保持LB测试在这种情况下可能具有的能力。随着h增加，功率下降。LB测试是一个可怕的弱测试；您必须有很多样本；n必须大于等于100才有意义。不幸的是，我从未见过更好的测试。但是也许存在。有人知道吗？

保罗3nt

0

没有正确的答案可以在所有情况下都起作用，原因是其他人表示这将取决于您的数据。

就是说，尝试找出在RI中的Stata中重现结果后，可以告诉您，默认情况下，Stata实现使用： $\mathrm{min}(\frac{n}{2}-2, 40)$ 。数据点数量的一半减去2，或40，以较小者为准。

当然，所有默认设置都是错误的，在某些情况下这肯定是错误的。在许多情况下，这可能不是一个不错的起点。

— 本杰明·马可·希尔
source

0

让我建议您使用我们的R软件包hwwntest。它已经实现了基于小波的白噪声测试，该测试不需要任何调整参数，并且具有良好的统计大小和功率。

此外，我最近发现了“关于Ljung-Box测试的想法”，这是Rob Hyndman对该主题的精彩讨论。

更新：考虑到该线程中有关ARMAX的替代讨论，看hwwntest的另一个动机是针对ARMA（p，q）模型的替代假设进行的一项检验的理论功效函数的可用性。

— 德尔扬·萨夫切夫（Delyan Savchev）
source