Convergence of the Iterates of Descent Methods for Analytic Cost Functions

P.-A. Absil; R. Mahony; B. Andrews

doi:10.1137/040605266

Paper ID sheet UCL-INMA-2005.901

Title

Convergence of the Iterates of Descent Methods for Analytic Cost Functions

Authors

P.-A. Absil, R. Mahony, B. Andrews

Abstract

In the early eighties Lojasiewicz proved that a bounded solution of a gradient flow for an analytic cost function converges to a well-defined limit point. In this paper, we show that the iterates of numerical descent algorithms, for an analytic cost function, share this convergence property if they satisfy certain natural descent conditions. The results obtained are applicable to a broad class of optimization schemes and strengthen classical ``weak convergence'' results for descent methods to ``strong limit-point convergence'' for a large class of cost functions of practical interest. The result does not require that the cost has isolated critical points, requires no assumptions on the convexity of the cost, nor any non-degeneracy conditions on the Hessian of the cost at critical points.

Key words

gradient ﬂows; descent methods; real analytic functions; Lojasiewicz gradient inequality; single limit-point convergence; line search; trust region; Mexican hat

Status

SIAM Journal on Optimization, Vol. 16, No. 2, pp. 531-547, 2005

Download

Publisher's version: doi:10.1137/040605266
Local copy

BibTeX entry


  @ARTICLE{AbsMahAnd2005,
  author = "Absil, P.-A. and Mahony, R. and Andrews, B.",
  title = "Convergence of the Iterates of Descent Methods for Analytic Cost Functions",
  journal = "SIAM J. Optim.",
  year = 2005,
  volume = 6,
  number = 2,
  pages = "531--547",
  }

Errata

Proposition 3.3 in the published version (doi:10.1137/040605266) is incorrect as stated. The published version does not preclude algorithms that may take large steps, possibly leaving the neighbourhood of the local critical point and landing at a point $x_K$ with $\phi(x_K) < 0$. This invalidates the claim in the final sentence: there is no guarantee that $\phi(x_K) \geq 0$, hence the displayed equation is not guaranteed to hold for $k = K$. We are grateful to Quentin Rebjock for bringing this error to our attention. Here are two strategies to fix Proposition 3.3:

A) In the first sentence of the proposition, replace "local minimum" by "global minimum". Then $\phi(x_K) \geq 0$ is guaranteed to hold in the final sentence of the proof, and the claim follows. (The proof can also be streamlined because $U_m$ is no longer relevant.)

B) Assume that the considered iteration (3.8) has the "vanishing steps" property at $x^*$ in the sense of Definition 3.2 of Rebjock and Boumal 2024 (doi:10.1007/s10107-024-02136-6). (As pointed out therein, this is a reasonable assumption.) In the proof, further require that $\epsilon > 0$ is small enough so that (i) $B_\epsilon(x^*)$ is included in the vanishing-steps neighborhood and (ii) the vanishing-step function $\eta$ satisfies $\eta(x) < \mathrm{dist}(\partial U_m,x^*) - \epsilon$ for all $x \in B_\epsilon(x^*)$. Then, when we reach the final sentence of the proof of Proposition 3.3, we have that $\|x_K - x^*\| \leq \|x_K - x_{K-1}\| + \|x_{K-1} - x^*\| < \eta(x_{K-1}) + \epsilon < \mathrm{dist}(\partial U_m,x^*) - \epsilon + \epsilon = \mathrm{dist}(\partial U_m,x^*)$, hence $x_K$ is in $U_m$, meaning that $\phi(x_K) \geq 0$. Hence, as in fix A) above, we have that $\phi(x_K) \geq 0$ is guaranteed to hold in the final sentence of the proof, and the claim follows.

Further information

It is natural to combine the results of section 2 with the property that the limit points of (sufficiently smooth) gradient flows are stationary points; see for example Morris W. Hirsch, Stephen Smale, Robert L. Devaney, Differential equations, dynamical systems, and an introduction to chaos, 2004, p. 206. For the discrete-time case, the classical weak convergence results alluded to on page 537 include those in section 1.2.2 of Dimitri P. Bertsekas, Nonlinear Programming, second edition, 1999.

[Home]