8

In the newton optimization algorithm to find the local minimum $x^*$ of a non-linear function $f(x)$ with iteration sequence of $x_0 \rightarrow x_1 \rightarrow x_2 ... \rightarrow x^*$ all $\nabla ^2 f(x_k)$ should be pos. definite otherwise search direction might not be a descent direction. The solution to this problem is to use Hessian modification techniques.

I am looking for a simple function that has local minimum at $x^*$, but $\nabla ^2 f(x_k)$ is not positive definite, and hence the Newton algorithm fails. Then after using one of the Hessian modification techniques, Newton algorithm converges to $x^*$

ManiAm
  • 733

2 Answers2

14

I found a good example:

$f(x_1,x_2)=(1.5-x_1+x_1*x_2)^2 + (2.25-x_1+x_1*x_2^2)^2 + (2.625-x_1+x_1*x_2^3)^2$

Damped Newton method with starting point $x_0 = (4,1)^T$ fails. This is why:

$\nabla^2 f(x_0)= \left( \begin{matrix} 0 & 27.75 \\ 27.75 & 610 \\ \end{matrix} \right) $ and search direction is $p_0=-\nabla^2 f(x_0)*\nabla f(x_0)=(-4,0)^T$

The search direction is pointing to a wrong direction! This is because Hessian matrix is not positive definite and thus $p_0$ is not in the descent direction.

enter image description here

After using the Hessian modification technique such as "adding a multiple of the identity", the algorithm works fine and the search direction points to the correct direction, and eventually finds the local minimum $(3,0.5)^T$

enter image description here

ManiAm
  • 733
  • 1
    It seems that the formula for search direction has a slight error. It has product of Hessian by gradient, but it should be product of inverse of Hessian by gradient (see https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization#Higher_dimensions) – user502144 Jan 27 '23 at 12:43
1

If the Hessian is negative-definite, then the direction is automatically ascent. Proof: if the Hessian is negative-definite, then its inverse exists and is also negative-definite. Newton's direction is $-[\nabla^2 f(x)]^{-1}(\nabla f)(x)$. Descent direction is $-(\nabla f)(x)$. Now the scalar multiplication of both is negative: $[(\nabla f)(x)]^{\text{T}}[\nabla^2 f(x)]^{-1}(\nabla f)(x)<0$. $\diamondsuit$