应用于近似奇异系统的迭代线性求解器的停止准则


16

考虑几乎奇异这意味着有一个本征值λ 0一个,这是非常小的。迭代方法的通常停止准则基于残差r n= b A x n,并认为当r n/r 0t o l且迭代次数为n时,迭代可以停止。但是在我们正在考虑的情况下,可能会有较大的误差vAx=bAλ0Arn:=bAxnrn/r0<tolnv住在与该小特征值有关的本征空间赋予小的残余v = λ 0 v。假设初始残差r 0大,则可能发生在r n/r 0t o l处,但误差x nx仍然很大。在这种情况下,更好的错误指示符是什么?是`` x nx n 1 ''λ0Av=λ0vr0rn/r0<tolxnxxnxn1 好候选人?


3
您可能需要考虑一下“几乎单数”的定义。矩阵(具有ε « 1单位矩阵)具有非常小的特征值,但是从单数据的任何矩阵就越高。Iϵϵ1I
David Ketcheson,2012年

1
另外,似乎是错误的表示法。| | r n | | / | | r 0 | | 更典型,不是吗?||rn/r0||||rn||/||r0||
比尔·巴特

是的,你是对的,比尔!我会纠正这个错误。
张晖

1
什么?究竟您的算法是什么?bAx/b
shuhalo 2012年

2
附录:我认为以下论文可以很好地解决您所担心的病态系统,至少在使用CG的情况下:Axelson,Kaporin:错误范数估计和预处理共轭梯度迭代中的终止标准。DOI:10.1002 / nla.244
shuhalo 2012年

Answers:


13

不要使用连续迭代之间的差异来定义停止条件。这会误诊停滞而导致收敛。大多数非对称矩阵迭代不是单调的,甚至在没有重新启动的精确算术中的GMRES可能在突然收敛之前停滞任意数量的迭代(直到矩阵的维数)。参见Nachtigal,Reddy和Trefethen(1993)中的示例。

定义收敛的更好方法

通常,我们对解决方案的准确性比对残差的大小更感兴趣。具体来说,我们可能想保证近似解与精确解x之间的差满足|。x nx | < c对于某些用户指定的c。事实证明,通过找到一个实现这个X ñ使得| A x nb | < c ϵ其中ϵA的最小奇异值,原因是xnx

|xnx|<c
cxn
|Axnb|<cϵ
ϵA

|xnx|=|A1A(xnx)|1ϵ|AxnAx|=1ϵ|Axnb|<1ϵcϵ=c

其中我们使用A 1(第二行)的最大奇异值,并且x精确地求解1/ϵA1x(第三行)。Ax=b

估计最小奇异值ϵ

通常无法直接从问题中获得最小奇异值的准确估计,但可以将其估计为共轭梯度或GMRES迭代的副产品。注意,虽然最大的特征值和奇异值的估计通常只有几次迭代后相当不错,最小本征/奇异值的准确估计通常只有一次达到收敛得到。在收敛之前,估计值通常会大大大于真实值。这表明您必须先实际求解方程式,然后才能定义正确的公差cϵ。具有用户提供的精度的自动收敛容差 ccϵc为解决方案和估计最小奇异值因为估计与克雷洛夫方法的当前状态可能会收敛得太早ε比实际值要大得多。ϵϵ

笔记

  1. 上面的讨论还与替换为左预条件算子P - 1和预调节残留P - 1X ñ - b 或与右预条件算P - 1和误差P X Ñx 。如果P 1AP1AP1(Axnb)AP1P(xnx)P1是一个很好的前提条件,前提条件是操作员条件良好。对于左预处理,这意味着可以使预处理残差变小,但实际残差可能不会变小。对于正确的预处理,很容易变小,但真正的错误| x nx | 未必。这就解释了为什么左预处理有利于减小误差,而右预处理有利于减小残差(以及调试不稳定的预处理器)。|P(xnx)||xnx|
  2. 有关GMRES和CG最小化的规范的更多讨论,请参见此答案
  3. 极值奇异值的估计可以使用 -ksp_monitor_singular_value任何PETSc程序。请参阅KSPComputeExtremeSingularValues()以从代码计算奇异值。
  4. 使用GMRES估计奇异值时,至关重要的是不要使用重启(例如-ksp_gmres_restart 1000在PETSc中)。

1
“也可以用A替换为预处理算子”-但是,如果使用P 1 A,则它仅适用于预处理残差。到预调节误差P - 1 δ X如果P - 1被使用。P1rP1AP1δxAP1
阿诺德·诺伊迈耶

1
好点,我编辑了答案。注意,右预处理例给出你控制的,退绕预条件(施加P - 1)典型地放大在误差低能量模式。PδxP1
杰德·布朗

6

解决此问题的另一种方法是考虑离散逆问题的工具,即涉及解决min |的问题。| A x b | | 2其中非常病态(即,第一和最后一个奇异值之间的比率σ 1 / σ ÑAx=bmin||Axb||2Aσ1/σn大)。

在这里,我们有几种选择停止准则的方法,对于迭代方法,我建议使用L曲线准则,因为它仅涉及已经可用的量(免责声明:我的顾问率先采用了这种方法,因此我肯定偏向于它)。我已经在迭代方法中成功使用了它。

ρk=||Axkb||2 and the solution norm ηk=||xk||2, where xk is the k'th iterate. As you iterate, this begins to draw the shape of an L in a loglog(rho,eta) plot, and the point at the corner of that L is the optimal choice.

This allows you to implement a criterion where you keep an eye on when you have passed the corner (i.e. looking at the gradient of (ρk,ηk)), and then choose the iterate that was located at the corner.

The way I did it involved storing the last 20 iterates, and if the gradient abs(log(ηk)log(ηk1)log(ρk)log(ρk1)) was larger than some threshold for 20 successive iterations, I knew that I was on the vertical part of the curve and that I had passed the corner. I then took the first iterate in my array (i.e. the one 20 iterations ago) as my solution.

There are also more detailed methods for finding the corner, and these work better but require storing a significant number of iterates. Play around with it a bit. If you are in matlab, you can use the toolbox Regularization Tools, which implements some of this (specifically the "corner" function is applicable).

Note that this approach is particularly suitable for large-scale problems, since the extra computing time involved is minuscule.


1
Thanks a lot! So in loglog(rho,eta) plot we begin from the right of the L curve and end at the top of L, is it? I just do not know the principle behind this criterion. Can you explain why it always behave like an L curve and why we choose the corner?
Hui Zhang

You're welcome :-D. For an iterative method, we begin from right and end at top always. It behaves as an L due to the noise in the problem - the vertical part happens at ||Axb||2=||e||2, where e is the noise vector bexact=b+e. For more analysis see Hansen, P. C., & O'Leary, D. P. (1993). The use of the L-curve in the regularization of discrete ill-posed problems. SIAM Journal on Scientific Computing, 14. Note that I just made a slight update to the post.
OscarB

4
@HuiZhang: it isn't always an L. If the regularization is ambiguous it may be a double L, leading to two candidates for the solution, one with gross featurse better resolved, the other with certain details better resolved. (And of course, mor ecomplex shapes may appear.)
Arnold Neumaier

Does the L-curve apply to ill-conditioned problems where there should be a unique solution? That is, I'm interested in problems Ax = b where b is known "exactly" and A is nearly singular but still technically invertible. It would seem to me that if you use something like GMRES the norm of your current guess of x doesn't change too much over time, especially after the first however many iterations. It seems to me that the vertical part of the L-curve occurs because there is no unique/valid solution in an ill-posed problem; would this vertical feature be present in all ill-conditioned problems?
nukeguy

At one point, you will reach such a vertical line, typically because the numerical errors in your solution method result in ||Ax-b|| not decreasing. However, you are right that in such noise-free problems the curve does not always look like an L, meaning that you typically have a few corners to choose from and choosing one over the other can be hard. I believe that the paper I referenced in my comment above discusses noise-free scenarios briefly.
OscarB
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.